cat on shelf?

This note describes a parsing algorithm for a slightly unusual query syntax. I will explain how to discover it by taking a series of small steps starting from traditional logic program syntax (skip to the first section if you haven’t seen that before).

summary. The point of the new syntax is that it requires fewer binary join variables (variables that show up in exactly two positions). For instance, if I want to know about all the cats that are on shelves¹, I might write this Prolog query:

cat(X), on(X, Y), shelf(Y)

but in the new syntax, we can write

cat on shelf

This relies on two ideas: allow predicates to come out of order

that way on can show up in-between its arguments

and allow unary predicates to be used like variables

so instead of using a unary predicate application like cat X to pick out each cat, and then plugging X somewhere else in the query, we just write cat where we want it to go.

Prolog Syntax

I want to know about all the cats that are on shelves:

cat(X), on(X, Y), shelf(Y)

Each section will begin with this example written in a different way.

ML-flavored Prolog Syntax

cat X, on X Y, shelf Y

similar to syntax of Standard ML, Haskell.

Record Arities, Elide Commas

: cat/1 shelf/1 on/2

cat X on X Y shelf Y

Arity refers to the number of parameters that a predicate needs to be applied to, and the number of columns in the corresponding relation. So every cat tuple contains one value, which we take to represent the cat in question, and on has arity two because it relates two things to each other (the “supported” and the “supporter”).

You could call a set of predicates and their arities a schema. With a schema, and assuming that each predicate is fully applied, we know where the commas should be, so we could omit them. Commas are still useful for readability.

Allow Unary Predicates in Place of Variables

on cat shelf

This is the first substantial step. We read this as some cat is on some shelf, where the variables representing those entities have been elided. We need to infer them. We do so with this algorithm:

Prepare an output, a stack, and the input starting with left-most word. The stack and output contain items: predicates that have been applied to zero or more variables.
If the stack contains at least two items, and the top item is unary: generate a fresh variable and apply the top two items to it.
1. If the stack now contains any nullary (fully applied) items, move them to the output.
Otherwise, push the next word from the input onto the stack.

Note that when we say unary predicate we mean both a predicate that starts out unary, and any higher-arity predicate that has been partially applied. Here is how the algorithm processes the example:

; output | stack | input
------------------------
(0)  . | . | on cat shelf
(2)  . | on | cat shelf
(2)  . | on cat | shelf
(1)  cat X | on X | shelf
(2)  cat X | on X shelf | .
(1)  cat X, on X Y, shelf Y | . | .

The final result is a series of fully applied predicates.

(aside if you are familiar with the jargon: Prolog programs have a verbosity problem compared to functional programs; it’s kind of like being forced to write in ANF form. This algorithm is similar to the ANF translation.)

Final Step: Allow Argument Re-ordering

cat X, X on Y, shelf Y

or simply

cat on shelf

We use the same algorithm, but modify step (1):

(modified) If the stack contains at least two items, and either of the top two items is unary: generate a fresh variable and apply the top two items to it.

Here’s a more elaborate example that involves a ternary word:

: telescope/1 with/2 sees/3

; cat sees cat with telescope

(0)  . | . | cat sees cat with telescope
(2)  . | cat | sees cat with telescope
(2)  . | cat sees | cat with telescope
(1)  cat X | sees X | cat with telescope
(2)  cat X | sees X cat | with telescope
(1)  cat X, cat Y | sees X Y | with telescope
(2)  cat X, cat Y | sees X Y with | telescope
(1)  cat X, cat Y, sees X Y T | with T | telescope
(2)  cat X, cat Y, sees X Y T | with T telescope | .
(1)  cat X, cat Y, sees X Y T, with T Z, telescope Z | . | .

Linguistic aside: we’re thinking of sees/3 as a relation that marks an instance of seeing; the instance is captured by T. The subject and object (X, Y) are also captured by sees/3, but other optional modifiers (the instrument, in this example) can be attached using other predicates.

Final notes:

The algorithm handles variables in the input stream: treat each variable like an arity-one predicate, and when you need to “apply” variable X to a fresh variable Y, generate the item X = Y. You can eliminate these equalities via substitution afterwards or as you go.
Pairs of unary predicates, for instance cat telescope and telescope cat, unwind into the same items, just in a different order. Assuming all relations are finite, the query defined is the same in either case. This property is necessary for the validity of re-ordering. You might not want to use this in a context where the order of predicates matters for performance/termination.
The algorithm doesn’t handle >2 arity predicates as well as you might want (can’t place multiple arguments before the predicate).
It allows for various confusing expressions like
```
on on cat cat = on (on cat) cat = on cat Y, on Y cat
```
that is, “a cat is on something that is on a cat”. This sentence has the property that its parse requires more than two entries of stack space. Conjecture: sentences that require only two entries more closely resemble English clauses.
There are many useful operators to discover! For instance, if you later need to refer to the cat, you are forced to rewrite the query:
```
cat X, X on shelf
```
Instead, let’s define a ternary equality predicate, written &. We’ll assume that & X Y Z is true exactly when X = Y = Z. Then we can write
```
cat & X on shelf
```
which is lexically a smaller depature from the original query. Note that this also helps with “every long black cat on a shelf”:
```
(long & black & cat) on shelf
```
Alternatively we could use : as the glyph, and write cute things like
```
(X : cat & black) on shelf
```
It probably fails on more difficult natural language examples. It would be interesting to see if types can help.

https://cutfree.net

↩︎