Struct DiscreteChar
Represents a distribution over characters.
Implements
Inherited Members
Namespace: Microsoft.ML.Probabilistic.Distributions
Assembly: Microsoft.ML.Probabilistic.dll
Syntax
[Quality(QualityBand.Experimental)]
[Serializable]
public struct DiscreteChar : IDistribution<char>, IDistribution, ICloneable, Diffable, SettableToUniform, HasPoint<char>, CanGetLogProb<char>, SettableTo<DiscreteChar>, SettableToProduct<DiscreteChar>, SettableToProduct<DiscreteChar, DiscreteChar>, SettableToRatio<DiscreteChar>, SettableToRatio<DiscreteChar, DiscreteChar>, SettableToPower<DiscreteChar>, SettableToWeightedSumExact<DiscreteChar>, SettableToWeightedSum<DiscreteChar>, SettableToPartialUniform<DiscreteChar>, CanGetLogAverageOf<DiscreteChar>, CanGetLogAverageOfPower<DiscreteChar>, CanGetAverageLog<DiscreteChar>, CanGetMode<char>, Sampleable<char>, CanEnumerateSupport<char>, ISerializable, IEquatable<DiscreteChar>
Fields
CharRangeEndExclusive
The (exclusive) end of the character range.
Declaration
public const int CharRangeEndExclusive = 65536
Field Value
Type | Description |
---|---|
Int32 |
LetterCharacterRanges
Ranges of letters (for Letter()).
Declaration
public const string LetterCharacterRanges = "AZÀÖØÞazßöøÿ"
Field Value
Type | Description |
---|---|
String |
LowerCaseCharacterRanges
Ranges of lowercase characters (for Lower()).
Declaration
public const string LowerCaseCharacterRanges = "azßöøÿ"
Field Value
Type | Description |
---|---|
String |
MixedCaseCharacterRanges
Ranges of mixed upper and lower case characters, which are merged for computational efficiency reasons.
Declaration
public const string MixedCaseCharacterRanges = ""
Field Value
Type | Description |
---|---|
String |
Remarks
In some Unicode ranges, upper and lower characters alternate which makes it very expensive to represent just lower or just upper characters using ranges. Instead, we use a loose approximation and allow characters in such ranges to be considered either lower or upper case.
This approximation means that upper and lower case character ranges are no longer mutually exclusive.
OnlyLowerCaseCharacterRanges
Ranges of only lowercase characters, excluding the mixed range.
Declaration
public const string OnlyLowerCaseCharacterRanges = "azßöøÿ"
Field Value
Type | Description |
---|---|
String |
OnlyUpperCaseCharacterRanges
Ranges of only uppercase characters, excluding the mixed range
Declaration
public const string OnlyUpperCaseCharacterRanges = "AZÀÖØÞ"
Field Value
Type | Description |
---|---|
String |
UpperCaseCharacterRanges
Ranges of uppercase characters (for Upper()).
Declaration
public const string UpperCaseCharacterRanges = "AZÀÖØÞ"
Field Value
Type | Description |
---|---|
String |
Properties
HasLogProbabilityOverride
Gets a value indicating whether this distribution is partial uniform with a log probability override.
Declaration
public bool HasLogProbabilityOverride { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsBroad
Gets a value indicating whether this distribution is broad - i.e. a general class of values.
Declaration
public bool IsBroad { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsDigit
Gets a value indicating whether this distribution equals the distribution created by Digit().
Declaration
public bool IsDigit { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsInitialized
Declaration
public bool IsInitialized { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsLetter
Gets a value indicating whether this distribution equals the distribution created by Letter().
Declaration
public bool IsLetter { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsLetterOrDigit
Gets a value indicating whether this distribution equals the distribution created by LetterOrDigit().
Declaration
public bool IsLetterOrDigit { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsLower
Gets a value indicating whether this distribution equals the distribution created by Lower().
Declaration
public bool IsLower { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsPointMass
Gets a value indicating whether this distribution represents a point mass.
Declaration
public bool IsPointMass { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsUpper
Gets a value indicating whether this distribution equals the distribution created by Upper().
Declaration
public bool IsUpper { get; }
Property Value
Type | Description |
---|---|
Boolean |
IsWordChar
Gets a value indicating whether this distribution equals the distribution created by WordChar().
Declaration
public bool IsWordChar { get; }
Property Value
Type | Description |
---|---|
Boolean |
Item[Char]
Gets the probability of a given character under this distribution.
Declaration
public double this[char value] { get; }
Parameters
Type | Name | Description |
---|---|---|
Char | value | The character. |
Property Value
Type | Description |
---|---|
Double | The probability of the character under this distribution. |
LogProbabilityOverride
Gets a value for the log probability override if it exists.
Declaration
public double? LogProbabilityOverride { get; }
Property Value
Type | Description |
---|---|
Nullable<Double> |
Point
Gets or sets the point mass represented by the distribution.
Declaration
public char Point { get; set; }
Property Value
Type | Description |
---|---|
Char |
Ranges
Gets an array of character ranges with associated probabilities.
Declaration
public ReadOnlyArray<DiscreteChar.CharRange> Ranges { get; }
Property Value
Type | Description |
---|---|
ReadOnlyArray<DiscreteChar.CharRange> | An array of character ranges with associated probabilities. |
Methods
Any()
Creates a uniform distribution over all characters.
Declaration
public static DiscreteChar Any()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
AppendRegex(StringBuilder, Boolean)
Appends a regex expression that represents this character to the supplied string builder.
Declaration
public void AppendRegex(StringBuilder stringBuilder, bool useFriendlySymbols = false)
Parameters
Type | Name | Description |
---|---|---|
StringBuilder | stringBuilder | The string builder to append to |
Boolean | useFriendlySymbols | Whether to use friendly symbols." |
AppendToString(StringBuilder)
Declaration
public void AppendToString(StringBuilder stringBuilder)
Parameters
Type | Name | Description |
---|---|---|
StringBuilder | stringBuilder |
Clone()
Creates a copy of this distribution.
Declaration
public DiscreteChar Clone()
Returns
Type | Description |
---|---|
DiscreteChar | The created copy. |
Complement()
Creates a distribution which is uniform over all characters that have zero probability under this distribution i.e. that are not 'in' this distribution.
Declaration
public DiscreteChar Complement()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Remarks
This is useful for defining characters that are not in a particular distribution e.g. not a letter or not a word character.
Create(IEnumerable<DiscreteChar.CharRange>)
Creates a distribution given a list of constant probability character ranges.
Declaration
[Construction(new string[]{"Ranges"})]
public static DiscreteChar Create(IEnumerable<DiscreteChar.CharRange> ranges)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<DiscreteChar.CharRange> | ranges | The constant-probability character ranges. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Remarks
The probabilities do not need to be normalized. The character ranges do not need to be sorted.
Digit()
Creates a uniform distribution over digits '0'..'9'.
Declaration
[Construction(UseWhen = "IsDigit")]
public static DiscreteChar Digit()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
EnumerateSupport()
Enumerates over the support of the distribution instance.
Declaration
public IEnumerable<char> EnumerateSupport()
Returns
Type | Description |
---|---|
IEnumerable<Char> | The character values with non-zero mass. |
Equals(DiscreteChar)
Checks if that
equals to this distribution (i.e. represents the same distribution over characters).
Declaration
public bool Equals(DiscreteChar that)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | that | The object to compare this distribution with. |
Returns
Type | Description |
---|---|
Boolean | true if this distribution is equal to |
Equals(Object)
Checks if obj
equals to this distribution (i.e. represents the same distribution over characters).
Declaration
public override bool Equals(object obj)
Parameters
Type | Name | Description |
---|---|---|
Object | obj | The object to compare this distribution with. |
Returns
Type | Description |
---|---|
Boolean | true if this distribution is equal to |
Overrides
FromVector(PiecewiseVector)
Creates a point character from a vector of (unnormalized) character probabilities.
Declaration
public static DiscreteChar FromVector(PiecewiseVector vector)
Parameters
Type | Name | Description |
---|---|---|
PiecewiseVector | vector | The vector of unnormalized probabilities. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
FromVector(Vector)
Creates a point character from a vector of (unnormalized) character probabilities.
Declaration
public static DiscreteChar FromVector(Vector vector)
Parameters
Type | Name | Description |
---|---|---|
Vector | vector | The vector of unnormalized probabilities. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
GetAverageLog(DiscreteChar)
Computes the expected logarithm of a given distribution under this distribution.
Declaration
public double GetAverageLog(DiscreteChar distribution)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution to take the logarithm of. |
Returns
Type | Description |
---|---|
Double |
|
Remarks
This is also known as the cross entropy.
GetHashCode()
Gets the hash code of this distribution.
Declaration
public override int GetHashCode()
Returns
Type | Description |
---|---|
Int32 | The hash code. |
Overrides
GetImpliedCharClasses(DiscreteChar.CharClasses)
Declaration
public static DiscreteChar.CharClasses GetImpliedCharClasses(DiscreteChar.CharClasses charClass)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar.CharClasses | charClass |
Returns
Type | Description |
---|---|
DiscreteChar.CharClasses |
GetLogAverageOf(DiscreteChar)
Returns the logarithm of the probability that the current distribution would draw the same sample as a given one.
Declaration
public double GetLogAverageOf(DiscreteChar distribution)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The given distribution. |
Returns
Type | Description |
---|---|
Double | The logarithm of the probability that distributions would draw the same sample. |
GetLogAverageOfPower(DiscreteChar, Double)
Computes the log-integral of one distribution times another raised to a power.
Declaration
public double GetLogAverageOfPower(DiscreteChar distribution, double power)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The other distribution |
Double | power | The power. |
Returns
Type | Description |
---|---|
Double |
|
Remarks
This is not the same as GetLogAverageOf(distribution^power)
because it includes the normalization constant of that.
GetLogProb(Char)
Gets the logarithm of the probability of a given character under this distribution.
Declaration
public double GetLogProb(char value)
Parameters
Type | Name | Description |
---|---|---|
Char | value | The character. |
Returns
Type | Description |
---|---|
Double | true if this distribution is a uniform distribution over all possible characters, false otherwise. |
GetMode()
Gets the mode of this distribution.
Declaration
public char GetMode()
Returns
Type | Description |
---|---|
Char | The mode. |
GetProbs()
Gets a vector of character probabilities under this distribution.
Declaration
public PiecewiseVector GetProbs()
Returns
Type | Description |
---|---|
PiecewiseVector | A vector of character probabilities. |
InRange(Char, Char)
Creates a uniform distribution over characters in a given range.
Declaration
public static DiscreteChar InRange(char start, char end)
Parameters
Type | Name | Description |
---|---|---|
Char | start | The start of the range (inclusive). |
Char | end | The end of the range (inclusive). |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
InRanges(Char[])
Creates a distribution which is uniform over values in multiple ranges specified by pairs of start and end values. These pairs are specified as adjacent values in an array whose length must therefore be even.
Declaration
public static DiscreteChar InRanges(params char[] startEndPairs)
Parameters
Type | Name | Description |
---|---|---|
Char[] | startEndPairs | The array of range starts and ends. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
InRanges(IEnumerable<Char>)
Creates a distribution which is uniform over values in multiple ranges specified by pairs of start and end values. These pairs are specified as adjacent values in a sequence whose length must therefore be even.
Declaration
public static DiscreteChar InRanges(IEnumerable<char> startEndPairs)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<Char> | startEndPairs | The sequence of range starts and ends. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
IsPartialUniform()
Checks whether the distribution is uniform over its support.
Declaration
public bool IsPartialUniform()
Returns
Type | Description |
---|---|
Boolean | true if the distribution is uniform over its support, false otherwise. |
IsUniform()
Checks whether this distribution is a uniform distribution over all characters.
Declaration
public bool IsUniform()
Returns
Type | Description |
---|---|
Boolean | true if this distribution is a uniform distribution over all possible characters, false otherwise. |
Letter()
Creates a uniform distribution over letters.
Declaration
[Construction(UseWhen = "IsLetter")]
public static DiscreteChar Letter()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
LetterOrDigit()
Creates a uniform distribution over letters and '0'..'9'.
Declaration
[Construction(UseWhen = "IsLetterOrDigit")]
public static DiscreteChar LetterOrDigit()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Lower()
Creates a uniform distribution over lowercase letters.
Declaration
[Construction(UseWhen = "IsLower")]
public static DiscreteChar Lower()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
MaxDiff(Object)
Gets the maximum difference between the character probabilities under this distribution and a given one.
Declaration
public double MaxDiff(object distribution)
Parameters
Type | Name | Description |
---|---|---|
Object | distribution | The distribution to compute the maximum probability difference with. |
Returns
Type | Description |
---|---|
Double | The computed maximum probability difference. |
NonWordChar()
Creates a uniform distribution over all characters except (letter, digit and '_').
Declaration
public static DiscreteChar NonWordChar()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
OneOf(Char[])
Creates a distribution which is uniform over the specified set of characters.
Declaration
public static DiscreteChar OneOf(params char[] chars)
Parameters
Type | Name | Description |
---|---|---|
Char[] | chars | The characters. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
OneOf(IEnumerable<Char>)
Creates a distribution which is uniform over the specified set of characters.
Declaration
public static DiscreteChar OneOf(IEnumerable<char> chars)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<Char> | chars | The characters. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
PointMass(Char)
Creates a point mass character distribution.
Declaration
public static DiscreteChar PointMass(char point)
Parameters
Type | Name | Description |
---|---|---|
Char | point | The point. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Read(Func<Int32>, Func<Double>)
Reads a discrete character.
Declaration
public static DiscreteChar Read(Func<int> readInt32, Func<double> readDouble)
Parameters
Type | Name | Description |
---|---|---|
Func<Int32> | readInt32 | |
Func<Double> | readDouble |
Returns
Type | Description |
---|---|
DiscreteChar |
Sample()
Draws a sample from the distribution.
Declaration
public char Sample()
Returns
Type | Description |
---|---|
Char | The drawn sample. |
Sample(Char)
Draws a sample from the distribution.
Declaration
public char Sample(char result)
Parameters
Type | Name | Description |
---|---|---|
Char | result | A pre-allocated storage for the sample (will be ignored). |
Returns
Type | Description |
---|---|
Char | The drawn sample. |
SetTo(DiscreteChar)
Sets this distribution to be equal to a given distribution.
Declaration
public void SetTo(DiscreteChar distribution)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution to set this distribution to. |
SetToPartialUniform()
Sets the distribution to be uniform over its support.
Declaration
public void SetToPartialUniform()
SetToPartialUniformOf(DiscreteChar)
Sets the distribution to be uniform over the support of a given distribution.
Declaration
public void SetToPartialUniformOf(DiscreteChar distribution)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution whose support will be used to setup the current distribution. |
SetToPartialUniformOf(DiscreteChar, Nullable<Double>)
Sets the distribution to be uniform over the support of a given distribution.
Declaration
public void SetToPartialUniformOf(DiscreteChar distribution, double? logProbabilityOverride)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution whose support will be used to setup the current distribution. |
Nullable<Double> | logProbabilityOverride | An optional value to override for the log probability calculation against this distribution. If this is set, then the distribution will not be normalize; i.e. the probabilities will not sum to 1 over the support. |
Remarks
Overriding the log probability calculation in this way is useful within the context of using StringDistribution to create more realistic language model priors. Distributions with this override are always uncached.
SetToPower(DiscreteChar, Double)
Sets the current distribution to a given distribution raised to a given power.
Declaration
public void SetToPower(DiscreteChar distribution, double power)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution to raise to the power. |
Double | power | The power. |
SetToProduct(DiscreteChar, DiscreteChar)
Sets this distribution to a product of a given pair of distributions.
Declaration
public void SetToProduct(DiscreteChar distribution1, DiscreteChar distribution2)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution1 | The first distribution. |
DiscreteChar | distribution2 | The second distribution. |
SetToRatio(DiscreteChar, DiscreteChar, Boolean)
Sets the current distribution to the ratio of a given pair of distributions.
Declaration
public void SetToRatio(DiscreteChar numerator, DiscreteChar denominator, bool forceProper = false)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | numerator | The numerator in the ratio. |
DiscreteChar | denominator | The denominator in the ratio. |
Boolean | forceProper | Specifies whether the ratio must be proper. |
SetToSum(Weight, DiscreteChar, Weight, DiscreteChar)
Sets this distribution to a weighted sum of a given pair of distributions.
Declaration
public void SetToSum(Weight weight1, DiscreteChar distribution1, Weight weight2, DiscreteChar distribution2)
Parameters
Type | Name | Description |
---|---|---|
Weight | weight1 | The weight of the first distribution. |
DiscreteChar | distribution1 | The first distribution. |
Weight | weight2 | The weight of the second distribution. |
DiscreteChar | distribution2 | The second distribution. |
SetToSum(Double, DiscreteChar, Double, DiscreteChar)
Sets this distribution to a weighted sum of a given pair of distributions.
Declaration
public void SetToSum(double weight1, DiscreteChar distribution1, double weight2, DiscreteChar distribution2)
Parameters
Type | Name | Description |
---|---|---|
Double | weight1 | The weight of the first distribution. |
DiscreteChar | distribution1 | The first distribution. |
Double | weight2 | The weight of the second distribution. |
DiscreteChar | distribution2 | The second distribution. |
SetToUniform()
Sets this distribution to a uniform distribution over all characters.
Declaration
public void SetToUniform()
SwapWith(DiscreteChar)
Swaps this distribution with a given one.
Declaration
public void SwapWith(DiscreteChar distribution)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | distribution | The distribution to swap this distribution with. |
ToLower(DiscreteChar)
Declaration
public static DiscreteChar ToLower(DiscreteChar unnormalizedCharDist)
Parameters
Type | Name | Description |
---|---|---|
DiscreteChar | unnormalizedCharDist |
Returns
Type | Description |
---|---|
DiscreteChar |
ToString()
Gets a string that represents this distribution.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
String | A string that represents this distribution. |
Overrides
Uniform()
Creates a uniform distribution over characters.
Declaration
[Construction(UseWhen = "IsUniform")]
public static DiscreteChar Uniform()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
UniformInRange(Char, Char)
Creates a uniform distribution over characters in a given range.
Declaration
public static DiscreteChar UniformInRange(char start, char end)
Parameters
Type | Name | Description |
---|---|---|
Char | start | The start of the range (inclusive). |
Char | end | The end of the range (inclusive). |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
UniformInRanges(Char[])
Creates a distribution which is uniform over values in multiple ranges specified by pairs of start and end values. These pairs are specified as adjacent values in an array whose length must therefore be even.
Declaration
public static DiscreteChar UniformInRanges(params char[] startEndPairs)
Parameters
Type | Name | Description |
---|---|---|
Char[] | startEndPairs | The array of range starts and ends. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
UniformInRanges(IEnumerable<Char>)
Creates a distribution which is uniform over values in multiple ranges specified by pairs of start and end values. These pairs are specified as adjacent values in a sequence whose length must therefore be even.
Declaration
public static DiscreteChar UniformInRanges(IEnumerable<char> startEndPairs)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<Char> | startEndPairs | The sequence of range starts and ends. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
UniformOver(Char[])
Creates a distribution which is uniform over the specified set of characters.
Declaration
public static DiscreteChar UniformOver(params char[] chars)
Parameters
Type | Name | Description |
---|---|---|
Char[] | chars | The characters. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
UniformOver(IEnumerable<Char>)
Creates a distribution which is uniform over the specified set of characters.
Declaration
public static DiscreteChar UniformOver(IEnumerable<char> chars)
Parameters
Type | Name | Description |
---|---|---|
IEnumerable<Char> | chars | The characters. |
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Upper()
Creates a uniform distribution over uppercase letters.
Declaration
[Construction(UseWhen = "IsUpper")]
public static DiscreteChar Upper()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Whitespace()
Creates a uniform distribution over whitespace characters ('\t'..'\r', ' ').
Declaration
public static DiscreteChar Whitespace()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
WordChar()
Creates a uniform distribution over word characters (letter, digit and '_').
Declaration
[Construction(UseWhen = "IsWordChar")]
public static DiscreteChar WordChar()
Returns
Type | Description |
---|---|
DiscreteChar | The created distribution. |
Write(Action<Int32>, Action<Double>)
Writes a discrete character.
Declaration
public void Write(Action<int> writeInt32, Action<double> writeDouble)
Parameters
Type | Name | Description |
---|---|---|
Action<Int32> | writeInt32 | |
Action<Double> | writeDouble |
Explicit Interface Implementations
ICloneable.Clone()
Creates a copy of this distribution.
Declaration
object ICloneable.Clone()
Returns
Type | Description |
---|---|
Object | The created copy. |
ISerializable.GetObjectData(SerializationInfo, StreamingContext)
Declaration
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
Parameters
Type | Name | Description |
---|---|---|
SerializationInfo | info | |
StreamingContext | context |