Seeing Alternatives More Clearly

Pruning of alternatives by inferencing is all very well, but at some stage, the probability of various alternative possible structures must be computed.

Parsing becomes difficult when any of the following alternatives is present:

NounVerbPhrase – word can be either a noun or a verb.

CoordinatePhrase – can be Clausal, Grouping, Local, Internal.

SubordinatePrepositional – Either subordinate conjunction or preposition.

We can only get so far with patterns – at some point we have to consider the alternatives in combination, preferably after all that can be directly eliminated have been.

We also need to bring in other information, not just consider the alternatives blocking parsing. An example –

"provision of location and time reference data to external systems"

The "and" here can be Grouping (linking to "provision") or Local (linking "location" with "data") or Internal (forming the construct "location data" using the stem of the trailing noun phrase). We can’t use similarity between two objects when checking for Internal – the two objects can be very different, but fall into the same group – "he posted the earthquake and weight loss data". We can also find a connection between "location" and "data" in the same sentence, strengthening the case for Internal.

What the "to" links to is important in determining the most appropriate alternative – "to" has a strong affinity to "provision" (there is a direct link between the preposition To and ToProvide) and a weak affinity to "data" ("to" requires a relation, except for constructions like "the road to Ghent").

We create tentative structure to represent all ways of linking these objects (the initial structure must have already been heavily pruned for this to be feasible), and end up with

PruningPathways.WMF (35388 bytes)

where S is Strong, M is Medium (no difference on different paths), and W is Weak, and where the probabilities are related to particular connections. When calculating the probabilities for "location and time reference data", we had to allow for merging – that is,

"location [data] and time reference data"

is one of the pairs.

If we start out on a Strong path, and chop away the inconsistent paths (just marking them to prevent traversal), we get

PruningPathways1.WMF (34144 bytes)

A leading preposition has to link to the Anded group following it, a trailing preposition either to the group or the last member of the group.

If we choose the Strong link again, we get

PruningPathways2.WMF (33320 bytes)

We arrange the parse structure to suit, log the result of the analysis, and remove the tentative structure.

If we can find only weak paths, the sentence is marked as ambiguous.

The method described here is computationally expensive, but avoids the parse process stalling on unpruned alternatives.

Constraint Reasoning

Constraint Reasoning in Parsing