Bayesian Logic in Active Structures

Introduction

An active structure is a cognitive model of some area of complex behaviour. The network forming the structure has a complete implementation of the logical connectives of sentential logic, the ANDs, ORs, IF...THENs. These connectives form a logical network, controlling and interacting with other parts of the network carrying numeric, string, object or list information.

Bayesian Logic has been called the Logic of Science, and could be described as an attempt to quantify induction. In simple terms, instead of True and False being discrete logical states, a logical continuum is established between zero and one, where zero represents a statement that is False, or has a probability of zero of being True. A statement that would be true 99% of the time has a probability of 0.99, although the probability should not just be viewed as a probability of occurrence on repeated trials - we also need to deal with situations where there can be only one trial.

A fundamental problem with a single measure is that there are two dimensions - existence and validity, and a single measure smears the different effects. Imagine a tossed coin which lands on edge, so it is neither heads nor tails. Imagine a clinical trial, with 100 patients receiving a drug, and 100 patients receiving a placebo. Five patients die of unrelated causes during the trial. Existence needs to be evaluated separately from validity, a point the Bayesian approach overlooks (See Existence).

Bayesian probabilities can be operated on using the rules for Sum and Product of probabilities, to produce values for ANDing and ORing of logical values. The equation for determining the sum of two probabilities has a term representing the cross-correlation between the two. This cross-correlation can be difficult to determine. Many systems have been "built on Bayesian principles", where the programmer or other expert puts in some rigid correlation as a way of interpreting evidence of a particular state. An example might be whether a person threw a ball. If they were facing the right way and not running etc... then the probability is... These toy uses with a shallow input don't lead very far. Bayesian logical operators deep in the structure need rather more than this to function properly. They can obtain this information by activation using Bayesian values. These values have an advantage, in that any value is consistent with any other - a value of 0.01 is not inconsistent with 0.99. This fact allows us to excite the structure with Bayesian values, and observe what returns.

Rather than the Logic of Science, Bayesian Logic is more like an impoverished form of scientific logic, one which attempts to capture in a single number the complexities of some behaviour, then propagate only that number. Scientists might be expected to propagate far more complex messages in their analysis, the "roundness" of the behaviour. Active structures can also propagate more complex messages than just a single number, they handle existence and validity simultaneously, but in different dimensions, and they can propagate states and values in any direction.  Nevertheless, Bayesian Logic is popular, sometimes models are simplified to the point of non-existence and input is restricted to a single number, so we should handle the situation with as much accuracy as possible.

Orion Implementation of Bayesian Logic

We have extended Bayesian logic to include the logic of existence of objects, where True means Does_Exist, False means Does_Not_Exist (the object either does not exist, or is known not to be present in the context), and Bayesian values cover the range between. The value of 0.5 has a special significance - it means the value should be ignored. Bayesian values are used to illuminate models of scientific knowledge, with their causality and associations.

See Logic of Existence

Orion can implement Bayesian behaviour at its logical connectives using relations to provide the correlation between probability values. The values used in the relations can be nominated by an expert outside the model, they can be mined from data, or they can be generated within the structure based on activation.

Let us begin with an example showing how probability values may be generated in the network. We have a statement

X <= 5

and we have a distribution on X, giving a range of 1 to 10. The statement may be True, it may be False, or it may presently be somewhere in between. We can use the distribution to determine what is the probability of the statement being True. Let us say the distribution is flat, so the probability is 0.5. If new information arrives at X, coercing its range to 5..10, the probability falls to 0.16. This example already tells us the probability values are likely to jump around considerably as constraints cut the ranges on variables.

In the rest of the network, influences flow both ways. What would it mean to assert that X<= 5 has a probability of 0.5. Should we adjust the distribution of X so that it has the required shape. Was the assertion only valid when X had the range 0 to 10, or is it valid now that the range on X has been reduced.

We will need to carefully control when, and under what circumstances, actions are taken based on logical probabilities.

Consistent Reasoning is the main reasoning method used on the active structure model. It is a powerful means of arriving at a valid conclusion. We make inferences that reduce the problem space, and as long as we do not encounter inconsistencies we know we have made no mistakes. When dealing with numbers, we reduce the interval, knowing that any inference made using the larger interval will be true with the smaller interval. With logicals, we assert True or False and observe the consequences. An heuristic, First Fail, is used to speed up the process - we try to cause a failure as quickly as possible.

Bayesian logic is all the other way - consecutive logical states are not consistent, we try to use statements with a high probability of truth, and we avoid statements with middling probability. That is, 0.9 and 0.1 are interesting (0.1 because we can invert it), 0.5 is not.

We are going to run into the fool's problem,

The less you know, the more sure you are

in that we will have high probabilities when we know little about the problem and our model statements are working in their valid range, with only a small tail of uncertainty. As we use Consistent Reasoning to know more, high probabilities will fall away and low probabilities may become certain.

If we attempt to combine Bayesian analysis with Consistent Reasoning, the two modes of operation seem to be antithetical. Why is this so, and what can we do about it?

Essentially, Bayesian logic is trying to bundle up a complex object into a single number.  A probability of 0.9 represents the centre of gravity of outcomes, one False and nine True. The range is still 0 to 1. But there are other attributes of this logical range.

We have a statement, All wingles are wongles. We check ten wingles and find that nine out of ten are wongles, so we ascribe a probability of 0.9. If we could check twenty wingles and find five were not wongles, we would change our probability.
We test a fair coin, and find that the probability of heads is 0.5. We would be prepared to believe this would apply to a thousand tosses, or a million. 
We estimate the odds that there was ever life on Mars as 0.5. To prove it false, we would need to scour the entire surface of the planet for any sign that life ever existed, and finding none, we might drop the probability to 0.1. If we found one fossil, the probability would go to 1.

The probability density functions of each of these examples are complex objects - one is our estimate based on current data and would readily change, one is an estimate based on our knowledge of mechanics and could be changed in either direction only with great difficulty, and the final one is an estimate easily changed in one direction and changed in the other only with a great deal of evidence.  We can incorporate a mechanism which updates our probability density functions in the model every time we receive information, so we "learn on the job", and we can weight the update so some evidence has more importance.

Combining the Two Modes of Reasoning

Let's get down to an example. We want to be able to diagnose faults in a photocopier. We build a constraint model of the innards of the copier, relating electrostatic charge to toner delivery, etc. Some constraints are hard-edged, no power, no work. Some constraints are soft, the toner density can be in a range. We build failure modes into the model - paper jams, corona wire breakage.

We now learn certain things about the malfunctioning copier. Everything we learn is used to validate some part of the model - this is Consistent Reasoning - if B is working, and B requires A, then A is working.

What we could do is use Consistent Reasoning while we know anything, then use Bayesian probability to fasten on the thing most likely, then ascertain something about it, then use Consistent Reasoning again to spread the new information around. This would minimise the system asking dopey questions. We would be using deduction, then induction, then deduction, realising that we may be making deductions based on surmises on the part of the person reporting the fault rather than hard facts, so we may need to undo many inferences when we encounter an inconsistency.

Consistent Reasoning is more useful than Bayesian Logic in this example, but we may need to get started, so we could consider Bayesian Logic as a useful heuristic for selecting the next target for Consistent Reasoning.

Technical Discussion

Existence

Extensions to Paradigm