This note describes a parsing algorithm for a slightly unusual query syntax. I will explain how to discover it by taking a series of small steps starting from traditional logic program syntax (skip to the first section if you haven’t seen that before).
summary. The point of the new syntax is that it requires fewer binary join variables (variables that show up in exactly two positions). For instance, if I want to know about all the cats that are on shelves1, I might write this Prolog query:
cat(X), on(X, Y), shelf(Y)
but in the new syntax, we can write
cat on shelf
This relies on two ideas: allow predicates to come out of order
that way
oncan show up in-between its arguments
and allow unary predicates to be used like variables
so instead of using a unary predicate application like
cat Xto pick out each cat, and then pluggingXsomewhere else in the query, we just writecatwhere we want it to go.
I want to know about all the cats that are on shelves:
cat(X), on(X, Y), shelf(Y)
Each section will begin with this example written in a different way.
cat X, on X Y, shelf Y
similar to syntax of Standard ML, Haskell.
: cat/1 shelf/1 on/2
cat X on X Y shelf Y
Arity refers to the number of parameters that a predicate
needs to be applied to, and the number of columns in the corresponding
relation. So every cat tuple contains one value, which we
take to represent the cat in question, and on has
arity two because it relates two things to each other (the “supported”
and the “supporter”).
You could call a set of predicates and their arities a schema. With a schema, and assuming that each predicate is fully applied, we know where the commas should be, so we could omit them. Commas are still useful for readability.
on cat shelf
This is the first substantial step. We read this as some cat is on some shelf, where the variables representing those entities have been elided. We need to infer them. We do so with this algorithm:
Note that when we say unary predicate we mean both a predicate that starts out unary, and any higher-arity predicate that has been partially applied. Here is how the algorithm processes the example:
; output | stack | input
------------------------
(0) . | . | on cat shelf
(2) . | on | cat shelf
(2) . | on cat | shelf
(1) cat X | on X | shelf
(2) cat X | on X shelf | .
(1) cat X, on X Y, shelf Y | . | .
The final result is a series of fully applied predicates.
(aside if you are familiar with the jargon: Prolog programs have a verbosity problem compared to functional programs; it’s kind of like being forced to write in ANF form. This algorithm is similar to the ANF translation.)
cat X, X on Y, shelf Y
or simply
cat on shelf
We use the same algorithm, but modify step (1):
Here’s a more elaborate example that involves a ternary word:
: telescope/1 with/2 sees/3
; cat sees cat with telescope
(0) . | . | cat sees cat with telescope
(2) . | cat | sees cat with telescope
(2) . | cat sees | cat with telescope
(1) cat X | sees X | cat with telescope
(2) cat X | sees X cat | with telescope
(1) cat X, cat Y | sees X Y | with telescope
(2) cat X, cat Y | sees X Y with | telescope
(1) cat X, cat Y, sees X Y T | with T | telescope
(2) cat X, cat Y, sees X Y T | with T telescope | .
(1) cat X, cat Y, sees X Y T, with T Z, telescope Z | . | .
Linguistic aside: we’re thinking of sees/3 as a
relation that marks an instance of seeing; the instance is
captured by T. The subject and object (X, Y)
are also captured by sees/3, but other optional modifiers
(the instrument, in this example) can be attached using other
predicates.
Final notes:
The algorithm handles variables in the input stream: treat each
variable like an arity-one predicate, and when you need to “apply”
variable X to a fresh variable Y, generate the
item X = Y. You can eliminate these equalities via
substitution afterwards or as you go.
Pairs of unary predicates, for instance
cat telescope and telescope cat, unwind into
the same items, just in a different order. Assuming all relations are
finite, the query defined is the same in either case. This property is
necessary for the validity of re-ordering. You might not want to use
this in a context where the order of predicates matters for
performance/termination.
The algorithm doesn’t handle >2 arity predicates as well as you might want (can’t place multiple arguments before the predicate).
It allows for various confusing expressions like
on on cat cat = on (on cat) cat = on cat Y, on Y cat
that is, “a cat is on something that is on a cat”. This sentence has the property that its parse requires more than two entries of stack space. Conjecture: sentences that require only two entries more closely resemble English clauses.
There are many useful operators to discover! For instance, if you later need to refer to the cat, you are forced to rewrite the query:
cat X, X on shelf
Instead, let’s define a ternary equality predicate, written
&. We’ll assume that & X Y Z is true
exactly when X = Y = Z. Then we can write
cat & X on shelf
which is lexically a smaller depature from the original query. Note that this also helps with “every long black cat on a shelf”:
(long & black & cat) on shelf
Alternatively we could use : as the glyph, and write
cute things like
(X : cat & black) on shelfIt probably fails on more difficult natural language examples. It would be interesting to see if types can help.