Chains and Recursion

Prepositional chains can include prepositionals and other symbols, such as participials, commas, conjunctions and relative pronoun clauses. The number of symbols in the chain can be above twenty, as in (after stripping out various parentheticals)

...with respect to any improvement work the Tenant desires to perform in the Expansion Area equal to $50.00, multiplied by the number of square feet of rentable area in the Expansion Area, multiplied by the number of calendar months from the Expansion Area Commencement Date to the day preceding the eleventh anniversary of the Lease Commencement Date, divided by 132.

This chain has as elements

PrepositionalPhrase
ChainRelativeClause
PrepositionalPhrase
PrepositionalPhrase
CommaPhrase
InterimVerbPhrase
PrepositionalPhrase
PrepositionalPhrase
PrepositionalPhrase
PrepositionalPhrase
CommaPhrase
InterimVerbPhrase
PrepositionalPhrase
PrepositionalPhrase
PrepositionalPhrase
PrepositionalPhrase
ParticipialPhrase
PrepositionalPhrase
CommaPhrase
PastParticipial
PrepositionalPhrase

The InterimVerbPhrase is because we can’t be sure whether the word is a past participle, making it a Participial, or the past tense of a verb, making it a VerbPhrase, and we can know that only after applying semantics to the chain.

The ChainRelativeClause may be connected to the trailing PrepositionalPhrases, and we need to apply semantics here as well, to bound it before extracting it.

A typical chain detection structure:

STRUCTURE1(PrepositionalChain,{NOCONNECT(BeforeChain),PrepositionalChainComponent1, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent, PrepositionalChainComponent2, NOCONNECT(AfterChain)}) ! 14 symbols

To avoid structures with more and more connections or changing to a recursive form, the prepositonals that occur in blocks are rolled into combinations, in this instance decreasing the number of symbols in the chain from twenty-one to fourteen. A combination structure:

STRUCTURE1(CombinedPrepositionalPhrase4,{NOCONNECT(BeforeCombinedPrepositional), PrepositionalPhrase, PrepositionalPhrase, PrepositionalPhrase, PrepositionalPhrase, NOCONNECT(AfterCombinedPrepositional)})

The structure will combine four prepositional phrases into one combined symbol.

But these combination structures cause a problem with the operation of the parser – it won’t build anything if it finds more than one structure that matches, and here it might find four or five structures. To avoid this, a totem pole describing precedence is constructed, so if two structures are found, and one of them is above the other in the totem pole, the lower one is removed as a possibility. A totem pole looks like

COMBINED1(PrepositionalChain, CombinedPrepositionalPhrase6)
COMBINED1(CombinedPrepositionalPhrase6, CombinedPrepositionalPhrase5)
COMBINED1(CombinedPrepositionalPhrase5, CombinedPrepositionalPhrase4)
COMBINED1(CombinedPrepositionalPhrase4, CombinedPrepositionalPhrase3)
COMBINED1(CombinedPrepositionalPhrase3, CombinedPrepositionalPhrase2)

If a PrepositionalChain structure can encompass and match the chain, it is used. If it cannot, then the largest combining structure in the totem pole is used. The reduced number of symbols may now allow a PrepositionalChain structure to match the symbols.

By this means, forty or fifty symbols in a chain can be accommodated (in other words, a completely unwieldy, typically lawyerly sentence) without using recursion.

Why not use recursion?

The pattern being swallowed isn’t simply a large number of identical symbols, but will usually have specialised incoming and outgoing detection symbols, as the PrepositionalChain does (BeforeChain, AfterChain). While these could be accommodated in a recursive style, the parser does not start at a symbol and move forward, it starts anywhere (around any point of activity) and moves forward and backward around the symbol, making a recursive approach rather messy.

NLP

Model Text