DRAFT

Graphs

We suppose a "graph database": a collection of nodes and binary relations between them. Each tuple, also called an edge, is an ordered pair of nodes along with a marker denoting the relation. A relation is the set of all edges with a particular marker.

An example:

graph1

  A R1 B
  B R1 C
  C R1 A

  B R2 C
  C R2 B
  A R2 B

It contains three objects (A, B, C), two relations (R1, R2), and six edges.

We will describe a series of constructions possible with these graphs. The constructions will have a few basic properties in common:

  1. They can be written down using a simple syntax. We will describe various basic clauses, and a construction will generally be a comma-separated list of clauses.
  2. They may refer directly to relation names (which will be capitalized, as in graph1).
  3. There is a class of nodes (node literals) that can be written down. When writing constructions, we will write them uppercase, as above. Node variables will be written lowercase.
  4. A construction can be evaluated with respect to a particular graph. The semantics of this evaluation map will be sketched out in what follows.

Constraint

The first construction allows us to bind pairs of nodes to names by way of a relation.

  x R y

We call x and y variable names. The meaning is, select an edge of R, then bind the left-hand node to x and right-hand node to y. Subsequent clauses will be free to refer to these names. When multiple matching edges exist for a clause, the meaning is to take all possible ways of binding the names and consider them in parallel.

For example:

  x R1 y, y R2 z

When applied to graph1, this yields three binding lists:

{ [x:A,y:B,z:C]
, [x:B,y:C,z:B]
, [x:C,y:A,z:B]
}

We will call each binding list a "blob". The result of a query is a set of blobs.

! The notion of "blob" will be extended repeatedly below. We will mark these additions with italics. When referring specifically to the current set of names and bindings within a blob, we may use the term context.

Subconstruction

[ ]

The next construction is "encapsulation." It allows us to create a construction whose result depends in various ways on a subconstruction. We will delimit a subconstruction using brackets:

a R1 b, [ z R2 b ], ...

The opening bracket [ delimits subsequent bindings. The construction within brackets is executed normally within the current context (so the b inside refers to the current binding to b coming from the earlier R1 clause, but z is a new name, and may produce zero or more bindings).

Upon encountering the closing bracket ], the entire set of context extensions, beginning after the corresponding [, are collected together. The collection of blobs (a result) is pushed onto a stack. In the example above, z is no longer a bound name after ].

! A blob contains a stack of results.

We can depict the result of the query above like this:

{ [a:A,b:B] ; {[z:C], [z:A]}
, [a:B,b:C] ; {[z:B]}
, [a:C,b:A] ; {}
}

NonEmpty, Unique, Empty

Two basic methods are available to observe (the top result on) the subconstruction stack:

  1. We may assert that the top result is empty, non-empty, or contains exactly one unique element. This removes it from the stack.
  2. We may assert (1) about the top elements of all of the blobs in the top stack result. This removes the top stack result. See below for clarification.

Examples of (1):

  a R1 b, [ z R2 b ] Unique

asserts that there must be a unique R2 edge pointing at b from some z. This will yield {[a:B,b:C]}.

Similarly,

  a R1 b, [ z R2 b ] NonEmpty

yields {[a:A,b:B], [a:B,b:C]}, and

  a R1 b, [ z R2 b ] Empty

yields [a:C,b:A].

The construction (2) supports universal quantification:

  [x P y, [y P x]] All NonEmpty

asserts that for all x and y such that [x P y], [y P x] must hold.

Creation

~

Now, we no longer view the graph as a static entity, but one whose contents can change as a result of a construction. This facility can create new nodes and edges.

! A blob contains a set of new edges, possibly referring to new nodes. We will begin writing a blob as a tuple (B Bindings Stack Edges) with three components. Added edges become visible to subsequent constraint clauses.

We mark assertion clauses using ~. The construction

  a R1 b, b ~R1 a

applied to graph1 will yield a result with three blobs; one of them is:

  (B [a:A, b:B] [] [B R1 A])

containing the reversal of A R1 B.

If an unbound name is used in an assertion clause, it is assumed to be drawn from some set of fresh nodes. It has no preexisting relation to any other node, but becomes part of the context as with any other binding operation and can be referenced or captured by subsequent clauses. These are called "invented values" in 1.

When a construction does not make use of assertion, we may call it a "query", and when its result is nonempty say that it has succeeded, or that it is true.

Merge

This construction takes each blob in the top result of the stack and merges their new edges into a single list. This takes a set of possible additions and combine them into the current branch, simultaneously. Thus,

 [ a R1 b, b ~R1 a ] Merge

will consist of a single blob with no bindings, an empty stack, and three new reversed R1 edges:

  (B [] [] [B R1 A, C R1 B, A R1 C])

This construction simulates the "deterministic semantics" of 1.

Time

(sketchy)

Presenting a graph "before" and "after" a construction is applied creates a measure of time. We suppose that each edge in a graph is timestamped, either by attaching some explicit data to it, or by means of the ambient context (edges might simply be written down, sequentially). When a construct asserts a new edge, it is given a fresh timestamp. We should take note that Merge leads to simultaneity; new edges coming from branches of a subconstruction will have not necessarily unique timestamps.

To note this more carefullly, we make our timestamps hierarchical, and mark all the edges coming from Merge with a common group time.

For instance:

  ..., a ~R b, a ~R c, [x R y, y ~R x, y ~R y] Merge, d ~R e, ...

should yield an "append log" such as:

  ...
  0    a R b
  1    a R c
  2:2  b R a
  2:3  b R b
  2:2  c R a
  2:3  c R c
  3    d R e
  ...

Indirection

We allow the binding of constructions to names. A construction thus bound may mark some of its names as bound; these are called parameters of the bound construction. To recognize a "pair", for instance:

pair t a b :=
  t First a, t Second b.

The construction has three parameters and is bound to the name pair. Not all names used by a construction must be bound:

grandparent g c :=
  g Parent x, x Parent c.

Applying this construction will possibly introduce g and c into the calling context, but will not expose x.

In general, applying a named construction subsitutes its operations into the current operation list, after substituting the arguments for its bound variables and renaming to avoid variable capture. To write an application, we write the name followed by arguments within a clause:

x Knows y, [grandparent y c] Unique

This construction returns all nodes that x knows and that have exactly one grandchild.

Disjunction

(sketchy)

Disjoin

The Disjoin operation takes a set of names and consumes the entire stack. It is expected that every blob within each (non-empty) result on the stack has bindings available for the set of names, and that the names are not bound in the current context. The meaning of Disjoin is to take the first non-empty result on the stack and extend the current context with its bindings for the set of names.

For example:

[a R b], [a P b], Disjoin [a,b]

should either yield the bindings from the first subquery, if R is nonempty, from the second query, if R is empty but P is nonempty, or else failure, if both are empty.

[[predicate x], Disjoin [x]]

Is an indirect way of writing

[predicate x]

We can use disjunction to write interesting recursive constructions. For instance, we can recognize binary trees:

leaf x := x Tag Leaf.
branch x l r := x Tag Branch, x Left l, x Right r.
tree x :=
 [ [leaf x],
   [branch x l r, tree l, tree r],
   Disjoin [x] ]

This construction succeeds exactly when x satisfies leaf, or when x satisfies branch for some l,r and they each satisfy tree.