Hypothesising in NLP

As things become complex and interdependent, it becomes necessary to experiment – to try things and observe what happens - rather than have predefined methods. A simple example is the eight queens problem – how to put eight queens on a chessboard so that they do not attack each other. No method is known that does not require experimentation. Orion has structural backtrack, so we can build structure and backtrack out of the building of the sentence structure if we encounter failure.

In language, it is easy to find examples of ambiguity. We start with

The computer generated images.

generated” is an active verb phrase.

The computer generated images of the aircraft.

The computer generated images of the aircraft lost in battle.

The computer generated images of the aircraft lost in battle were blurry.

generated” becomes a verb trapped in a noun phrase. We can’t decide on the role of the word “generated” until we can see the entire sentence. The way we see the whole sentence is by trying alternatives for the things we are not sure about.

If we assume “generated” is an active verb phrase, we will get an error – we will have “were blurry” left over. If we assume “generated” is a participial, we will get an error because of the object phrase, so we are left with only one alternative.

Not all hypothesising is black and white – we will also need to differentiate among shades of grey.

The man practised running marathons.

The man can run marathons, so this is what he practised.

The man practised training scenarios.

You can’t train a scenario, so this must mean

The man practised the scenarios used for training.

An ambiguous example

The man practised learning scenarios.

This could mean

The man practised the learning scenarios.

The man practised learning the scenarios.

Another example

The system enables the display of special features and the behaviour of the vehicle.

This could be

The system enables (the display of special features) and (the behaviour of the vehicle).

Or

The system enables (the display of special features and the behaviour of the vehicle).

Both are grammatically acceptable without error, only one of them is more in keeping with the meaning of the rest of the document. That is, we need to use meaning to determine which alternative was meant by the writer.

When hypothesising, we will need to know whether we are dealing with a black and white or a grey case, where we have to consider fitness of purpose. If it is only a black and white case, as soon as we have a successful result, we can stop.

Hypothesising Objects

The types of objects in a sentence whose purpose cannot be known without a larger view of the structure of the sentence:

Splitting Nouns

When we have a string of nouns, we can’t be sure whether they are joined in one long noun phrase, or should be split in two. Sometimes the context will provide clues.

The resource company records show is nearly exhausted.

Is this

The resource that company records show is nearly exhausted.

The resource company that records show is nearly exhausted.

NounVerbPhrase

Could be a noun or a verb – if a noun, could be part of a prior noun phrase

The airplane costs more than doubled

The airplane costs more than double

InterimPresentParticipial

If the interimparticipial has compatible noun phrases on either side of it, and the following phrase is not a time phrase, it is a participial.

If there are incompatible noun phrases on either side, and there is not a prior transparticipial (“he kept the engine running”), it must be a noun, and is merged with the preceding noun phrase.

If it is preceded by a RelationControl relation (“he practised running marathons”) and has an acceptable object, it becomes part of the verb phrase.

InterimPastParticipial

Could be a Past Participial or a verb phrase, or a verb captured in a noun phrase (“the computer generated image was blurred”). If the preceding noun phrases are not compatible as an object, it must be a verb phrase.

SubordinatePrepositional

Could be a subordinate conjunction or a preposition. If there is no following verb phrase, or no leading verb phrase, it is taken as a preposition.

CoordinateConjunction

Could be grouping objects, or joining clauses. In some cases, we need to resolve ambiguity.

He talked to Jack and Jill and Fred went home.

Example

The person training staff required for the store opening needs to exercise care.

person training staff” – looks like a participial – a person training the staff fits the modelling of to train, although “the new recruit training staff” is possible – it has person on either side of training. If “new recruit” is modelled as untrained, then the person who does training can be modelled as a trainer – someone who has been trained, and thus reject “new recruit” as a subject.

required” – this can be a participial or a verb phrase. The object of a participial cannot be the subject of a verb phrase, so the only possible subject is “person”. If a participial, “staff” can be required. Two alternatives.

for” can be a subordinate conjunction or a preposition. We cannot be sure whether there is a following verb phrase. Two alternatives.

opening” – stores open every day. We can’t be sure about “opening needs” (it could be a variant on “initial needs”), as the following “needs” is a nounverb phrase. Two alternatives.

needs” – this can be a noun, or it can be a verb phrase. The possible subjects are “person” and “store opening” (if “for” is a subordinate conjunction). Two alternatives.

care” can be a noun or a verb. The odds on it being a noun here are great, but it is possible to say “if those people who wish to exercise care to come this way”.  Unknown number of possible subjects. Two alternatives.

We now have six objects with alternatives, and need to choose the most appropriate combination. This is about the limit for humans.

We try “required” as a verb phrase. There seems no good object for required. We make it a participial. Its object can be either “staff” or “person”. Each is possible – we choose the nearer one. This forces “for” to be a preposition, as there is no verb phrase to its left.

We do not yet have a verb phrase for “person”. It could be “needs” or “care”. “Person care” does not work – a singular noun and a plural verb, so it must be “needs”.

We have no spare subject for “care”, so it must be a noun.

In this example, we had to make choices and see where they lead. We combined grammatical knowledge with relation modelling knowledge.

Some questions:

How do we represent the alternatives so the state of hypothesising survives undoing?

Do we use the existing building mechanism to build the structures, undoing where necessary? This would seem expensive.

We could build the relations with unknowns, and then use IS operators to switch the various hypothetical connections. This is the easiest conceptually.

persontrainedstaff.JPG (394343 bytes)

If we switch on one hypothesis, we need to switch off others that would conflict with it. We can do this with XOR operators – if we get an inconsistency in doing so, we can already rule out the alternative.

Hypothesis.JPG (100424 bytes)

That is, if we make “person” the subject of “required”, it can’t be the subject of “needs” or “care”, and “staff” can’t be the object of “required”. If “required” is not a verb phrase, “for” cannot be a subordinate conjunction, as there is no other possible verb phrase on its left.

subprep.jpg (137416 bytes)

 

The connection between the alternative of “required” being a verb phrase and “for” being a subordinate conjunction requires special logical modelling –

True verb phrase allows true or false subordinate conjunction

False verb phrase requires false subordinate conjunction, which forces a true preposition

If “for” is a preposition, then “the store” is at least part of a prepositional noun phrase. “to exercise” requires a person as the subject, and we only have “person” or “staff”.

If a verb phrase, we have

The person required

For (as conjunction)

The store needs to exercise care

“Required” has no object. We conclude that “required” is unlikely to be a verb phrase. If “required” is a past participial, then “for” is a preposition which swallows “the store opening” and we have

The person needs to exercise care

We turn “required” into a past participial, “for” into a preposition, “needs” into a verb phrase, and “care” into a noun, remove the hypothesising structure and complete the building of the sentence structure.

Conclusion

Hypothesising is time consuming, but offers a more compact solution than an enormous set of patterns. It is the only approach that allows two alternatives to be considered "side by side" when there are subtle differences in meaning between them.