Class StringDistribution
Represents a distribution over strings that uses a weighted finite state automaton as the underlying weight function.
Inheritance
Implements
Inherited Members
Namespace: Microsoft.ML.Probabilistic.Distributions
Assembly: Microsoft.ML.Probabilistic.dll
Syntax
[Serializable]
[DataContract]
[Quality(QualityBand.Preview)]
public class StringDistribution : SequenceDistribution<string, char, ImmutableDiscreteChar, StringManipulator, StringAutomaton, WeightFunctions<string, char, ImmutableDiscreteChar, StringManipulator, StringAutomaton>.MultiRepresentationWeightFunction<StringDictionaryWeightFunction>, WeightFunctions<string, char, ImmutableDiscreteChar, StringManipulator, StringAutomaton>.MultiRepresentationWeightFunction<StringDictionaryWeightFunction>.Factory, StringDistribution>, IDistribution<string>, IDistribution, ICloneable, Diffable, SettableToUniform, HasPoint<string>, CanGetLogProb<string>, SettableTo<StringDistribution>, SettableToProduct<StringDistribution>, SettableToProduct<StringDistribution, StringDistribution>, SettableToRatio<StringDistribution>, SettableToRatio<StringDistribution, StringDistribution>, SettableToPower<StringDistribution>, CanGetLogAverageOf<StringDistribution>, CanGetLogAverageOfPower<StringDistribution>, CanGetAverageLog<StringDistribution>, SettableToWeightedSumExact<StringDistribution>, SettableToWeightedSum<StringDistribution>, SettableToPartialUniform<StringDistribution>, CanGetLogNormalizer, Sampleable<string>
Methods
AppendInPlace(DiscreteChar, Int32)
Replaces the current distribution by a distribution over concatenations of sequences from the current distribution and a given element.
Declaration
public void AppendInPlace(DiscreteChar element, int group = 0)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | element | The element to append. |
Int32 | group | The group for the appended element. |
Remarks
The result is equivalent to the distribution produced by the following sampling procedure:
- Sample a random sequence from the current distribution.
- Append the given element to the sampled sequence and output the result.
Capitalized(Int32, Nullable<Int32>, Boolean)
Creates a uniform distribution over strings that start with an upper case letter followed by
one or more letters, with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Capitalized(int minLength = 2, int? maxLength = null, bool allowUpperAfterFirst = false)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 2. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Boolean | allowUpperAfterFirst | Whether to allow upper case letters after the initial upper case letter. If false, only lower case letters will be allowed. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
CaseInvariant(String)
Creates a uniform distribution over all strings that are case-invariant matches of the specified string.
Declaration
public static StringDistribution CaseInvariant(string template)
Parameters
Type | Name | Description |
---|---|---|
String | template | The string to match. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Char(DiscreteChar)
Creates a distribution over strings of length 1 induced by a given distribution over characters. This method is an alias for SingleElement(TElementDistribution).
Declaration
public static StringDistribution Char(DiscreteChar characterDist)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | characterDist | The distribution over characters. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Remarks
The distribution created by this method can differ from the result of Repeat(TThis, Int32, Nullable<Int32>) with both min and max length set to 1 since the latter always creates a partial uniform distribution.
Char(ImmutableDiscreteChar)
Creates a distribution over strings of length 1 induced by a given distribution over characters. This method is an alias for SingleElement(TElementDistribution).
Declaration
public static StringDistribution Char(ImmutableDiscreteChar characterDist)
Parameters
Type | Name | Description |
---|---|---|
ImmutableDiscreteChar | characterDist | The distribution over characters. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Remarks
The distribution created by this method can differ from the result of Repeat(TThis, Int32, Nullable<Int32>) with both min and max length set to 1 since the latter always creates a partial uniform distribution.
Char(Char)
Creates a distribution which puts all mass on a string containing only a given character. This method is an alias for SingleElement(TElement).
Declaration
public static StringDistribution Char(char ch)
Parameters
Type | Name | Description |
---|---|---|
Char | ch | The character. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Digits(Int32, Nullable<Int32>, DistributionKind)
Creates a uniform distribution over strings of digits, with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Digits(int minLength = 1, int? maxLength = null, DistributionKind uniformity = DistributionKind.UniformOverValue)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
DistributionKind | uniformity | The type of uniformity. Defaults to UniformOverValue |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Letters(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of lowercase and uppercase letters, with length in given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Letters(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
LettersOrDigits(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of digits, lowercase and uppercase letters, with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution LettersOrDigits(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Lower(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of lowercase letters, with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Lower(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
String(String)
Creates a point mass distribution. This method is an alias for PointMass(TSequence).
Declaration
public static StringDistribution String(string str)
Parameters
Type | Name | Description |
---|---|---|
String | str | The point. |
Returns
Type | Description |
---|---|
StringDistribution | The created point mass distribution. |
ToRegex()
Creates a regex pattern for the current string distribution. This can then in turn be used with the normal Regex .net class.
Declaration
public string ToRegex()
Returns
Type | Description |
---|---|
String | A regex pattern. |
ToString()
Creates a string representation of the current instance.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
String | A string representation of the current instance. |
Overrides
Upper(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of uppercase letters, with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Upper(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Whitespace(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of whitespace characters (see Whitespace()),
with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution Whitespace(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
WordChars(Int32, Nullable<Int32>)
Creates a uniform distribution over strings of word characters (see WordChar()),
with length within the given bounds.
If maxLength
is set to null,
there will be no upper bound on the length, and the resulting distribution will thus be improper.
Declaration
public static StringDistribution WordChars(int minLength = 1, int? maxLength = null)
Parameters
Type | Name | Description |
---|---|---|
Int32 | minLength | The minimum possible string length. Defaults to 1. |
Nullable<Int32> | maxLength | The maximum possible sequence length, or null for no upper bound on length. Defaults to null. |
Returns
Type | Description |
---|---|
StringDistribution | The created distribution. |
Operators
Addition(StringDistribution, StringDistribution)
Concatenates the weighted regular languages defined by given distributions (see Append(TThis, Int32)).
Declaration
public static StringDistribution operator +(StringDistribution first, StringDistribution second)
Parameters
Type | Name | Description |
---|---|---|
StringDistribution | first | The first distribution. |
StringDistribution | second | The second distribution. |
Returns
Type | Description |
---|---|
StringDistribution | The concatenation result. |