Re: [Toss-devel] Specification of Hierarchical Terms
Status: Beta
Brought to you by:
lukaszkaiser
|
From: Lukasz S. <luk...@gm...> - 2012-06-29 23:36:08
|
I've fixed the specification considerably. I've had a break since
Tuesday but I'm back since Thursday evening.
{3 Specification of Hierarchical Terms}
Hierarchical terms have the form: [f(s_1, ..., s_m; t_1, ...,
t_n)] where [f] is a (function/constant/class) symbol, terms [s_1,
..., s_m] are called direct supertypes and terms [t_1, ..., t_n]
are called direct subterms of the term, or [X(s)], where [X] is a
variable symbol and term [s] is called the type of the variable.
When [f(s_1, ..., s_m; t_1, ..., t_n)], resp. [X(s)] is not a
subterm or supertype of another term, we call [f], resp. [X] its
head symbol. Hierarchical terms are related by [ISA] as formally
presented below; we drop "hierarchical" when referring to terms
for convenience.
We call the {i upper grounding} of a term [t], [ug(t)], the term
[t] with all occurrences of [X(s)] for any variable symbol [X] and
term [s] replaced by [s].
We start by defining [t ISA t'] for [t] and [t'] ground terms
(i.e. terms without occurrences of variable symbols).
It holds if and only if one of the following holds:
{ul
{- [t' = Type] for the top symbol [Type],}
{- [s_i ISA t'] for some i, where [t=f(s_1, ..., s_m; t_1, ..., t_n)],}
{- [s_i ISA s'_i] for all [i], [t_i ISA t'_i] for all [i],
where [t=f(s_1, ..., s_m; t_1, ..., t_n)] and
[t'=f(s'_1, ..., s'_m; t'_1, ..., t'_n)]}}
Actually we sometimes do not put a special case for the top symbol
[Type] into algorithms, and therefore we rely on well-formed sets
of declarations that no term except [Type] has no direct
supertypes.
We define {i ground substitution over variables [V]} by extending
into a function over terms in the standard way a finite mapping
from all variables in [V] to terms, requiring that [X(s)] is
mapped to [t] only when: [t] does not contain variable symbols,
and [t ISA ug(s)]. By [FV(t)] we denote the free variables of [t].
[t ISA t'] if and only if for all ground substitutions [S] over
[FV(t)] there is a ground substitution [R] over [FV(t')] such that
[S(t) ISA R(t')].
(We present [ISA] as describing generality, "is less (or equally)
general than". Notice that this relation is often presented as
describing the amount of information, the reverse quantity, as "has
more information than".)
We need two more, also essential notions.
A {i well-formed set of declarations} is defined as a set of terms
such that all of the following conditions hold:
{ul
{- no two terms in it have the same head symbol,}
{- head symbols of terms in it are not variables,}
{- there is at most one glb-unification result for any two (or
more) elements from the set,}
{- no term except [Type] has no direct supertypes,}
{- every symbol used in any term in the set is a head symbol of
some element of the set,}
{- for any subterm or supertype of a term from the set, its upper
grounding ISA the element of the set which has the same head
symbol.}}
Note that the last three conditions will be enforced by parsing the
term to be introduced into the set of declarations, so in the
current module we only need to check the first three
conditions. (The condition "must have direct supertypes" also
needs to be ensured in {!BuiltinLang}.)
A {i well-formed term with respect to a set of declarations} given
a well-formed set of declarations, is a term [t] such that:
{ul
{- its supertypes and subterms are well-formed (w.r.t. this set of
declarations),}
{- there is a term [d] in the set of declarations such
that [t ISA d] and either the head symbol of [t] is a variable or
it is the same as the head symbol of [d],}
{- for any two occurrences [X(s)] and [X(s')] of variable symbol
[X], [s=s'].}}
We limit all terms considered to terms that are well-formed with
respect to a set of declarations (fixed by the context; we will
drop constantly mentioning "w.r.t. a set of declarations"). In
particular, we only allow {i well-formed substitutions}:
substitutions that when applied to a well-formed term, return a
well-formed term, defined in the next section.
{3 "Unification Theory" for Hierarchical Terms}
A {i well-formed substitution} is defined by extending into a
function over terms in the standard way a finite mapping from
variables (terms whose head symbols are variable symbols) to
terms, requiring that [X(s)] is mapped to [t] only when [t ISA s].
We introduce an order on substitutions, by saying that [S ISA R]
when for all [t], [S(t) ISA R(t)].
We define {i isa-matching} a term [p] against a term [t] (where
[p] and [t] are well-formed) as the smallest substitution [R] such
that [t ISA R(p)]. In context of matching, [p] is called a
pattern. (There is probably no simpler algorithm to decide [t ISA
p] than isa-matching [p] against [t].)
We define {i eq-matching} a pattern [p] against a term [t] as the
problem of finding a substitution [R] such that [t = R(p)]. Note
that if eq-matching [p] against [t] exists, isa-matching also
exists and is the same substitution.
We define {i glb-unification} ({i Greatest Lower Bound
Unification}) of two well-formed terms [t_1, t_2] as the greatest
term [t] and smallest substitution [R] such that [f(;t,t) ISA
R(f(;t_1,t_2))] (where [f] is an arbitrary symbol).
We define {i eq-unification} of [t_1, t_2] as the standard
(first-order) {i Most General Unifier, mgu} [U] of [t_1, t_2],
together with the corresponding term [U(t_1)]. Note that if the
mgu of [t_1, t_2] exists, the glb-unification also exists and is
the same substitution.
{3 Extension to Function Types}
In type systems, types of functions interact with subtyping in
such a way that only the result type is covariant -- behaves like
supertypes and subterms above -- while argument types are
contravariant. To ensure proper behavior and have maximal
flexibility, we would need to define "least upper bound
unification" mutually recursively with glb-unification, for use in
contravariant positions. To simplify we sacrifice flexibility: we
use the standard first-order unification instead of
glb-unification on function argument types.
We introduce function type symbol [funtype] and higher-order
variables [X(s; t_1, ..., t_n)].
Formally, the definition of [ISA] extends to [funtype] as follows:
For [t=funtype( ; t_1, ..., t_{n-1}, t_n)], [t ISA t'] if and only
if for all ground substitutions [s] over [FV(t)] there is a ground
substitution [r] over [FV(t')] such that [r(t')=funtype( ; t'_1,
..., t'_{n-1}, t'_n)] and [s(t_n) ISA t'_n], [s(t_i) = t'_i] for
[1 <= i <= n-1].
The definition of [ISA] for [X(s_0; t_1, ..., t_n)] is the same as
that for [f(s_0; t_1, ..., t_n)], only the conditions for
well-formed terms change. A variable symbol can appear as both the
term [X(funtype( ; s_1, ..., s_n, s_0))] and a term [X(s_0; t_1,
..., t_n)] as long as [t_i ISA s_i] for [1 <= i <= n]. Moreover, a
term [f(funtype( ; t_1, ..., t_n, s_0); )] is well-formed if and
only if [f(s_0; t_1, ..., t_n)] is well-formed.
|