The format of the DISTRIB function is:
DISTRIB( Control@, MinerState, Probability)
The function provides a range to a variable, indicating the values present in a list, internally constructed during the Learning phase.
The parameters control the operation of the DISTRIB function:
Control@ - if TRUE, the function will respond to the state of the Miner, either storing information or outputting it. A single variable can have many DISTRIB functions, only one of which can have its Control@ pin TRUE.
MinerState - the MinerState can be Quiescent, Learning or Running, or an intermediate state. If Learning, the information coming in the value pin is constructed into a list, for later output during the Running state.
Probability - if no Probability is specified externally, the initial value of Probability will be set to 0..100 (indicating a probability of 100%), and the full range will be output at the value pin. Reducing the range of the probability will reduce the range of alternatives at the value pin, based on frequency of occurrence.
The DISTRIB function handles both string and numeric distributions:
For strings, the actual strings are stored, together with a count of occurrence. When in Running state, a list of alternative string values is output. The alternatives are dependent on the probability, a lower probability threshold pruning the alternatives. As an example, the following strings are read from the database, together with their frequency of occurrence:
ABC 25 DEF 12 GHJ 3 XYZ 1
At completion of the learning phase, the strings are ordered in decreasing frequency. A request for all possible alternatives (Probability of 0..100) would result in
ABC, DEF, GHJ, XYZ
whereas a request for Probability of 0..90 would result in
ABC, DEF
If the number of strings would exceed 96, either a catchall string of '*' is used or merging is handled using a HIERARCHY operator.
For integers, the frequency of occurrence for individual integers is stored. If the number of different integers would exceed 100, clumping into ranges is used where frequency of occurrence is low. The combination of ranges where there is low frequency of occurrence with single values where there is high frequency keeps the overall number of objects around 100 while maintaining precision.
The list of values is normally destroyed during the StartLearn phase, built from
scratch during the Learn phase and sorted during the FinishLearn phase. If the list has
been re-ordered, the list becomes locked, and only the count is initialised during the
StartLearn phase. The list can be sorted and the lock removed using the facilities of Edit
Stochastic Operators.
Related Operators