$Id: env.html,v 1.2 1999/04/22 04:38:15 jimb Exp $
This is a draft proposal for a new datatype for representing top-level environments in Guile. Upon completion, this proposal will be posted to the mailing list `guile@cygnus.com' for discussion, revised in light of whatever insights that may produce, and eventually implemented.
Note that this is not a proposal for a module system; rather, it is a proposal for a data structure which encapsulates the ideas one needs when writing a module system, and, most importantly, a fixed interface which insulates the interpreter from the details of the module system. Using these environments, one could implement any module system one pleased, without changing the interpreter.
I hope this text will eventually become a chapter of the Guile manual; thus, the description of environments in written in the present tense, as if it were already implemented, not in the future tense. However, this text does not actually describe the present state of Guile.
I'm especially interested in improving the vague, rambling presentation of environments in the section "Modules and Environments". I'm trying to orient the user for the discussion that follows, but I wonder if I'm just confusing the issue. I would appreciate suggestions if they are concrete -- please provide new wording.
Note also: I'm trying out a convention I'm considering for use in the
manual. When a Scheme procedure which is directly implemented by a C
procedure, and both are useful to call from their respective languages,
we document the Scheme procedure only, and call it a "Primitive". If a
Scheme function is marked as a primitive, you can derive the name of the
corresponding C function by changing -
to _
, !
to
_x
, ?
to _p
, and prepending scm_
. The C
function's arguments will be all of the Scheme procedure's argumements,
both required and optional; if the Scheme procedure takes a "rest"
argument, that will be a final argument to the C function. The C
function's arguments, as well as its return type, will be SCM
.
Thus, a procedure documented like this:
has a corresponding C function which would be documented like this:
The hope is that this will be an uncluttered way to document both the C and Scheme interfaces, without unduly confusing users interested only in the Scheme level.
When there is a C function which provides the same functionality as a
primitive, but with a different interface tailored for C's needs, it
usually has the same name as the primitive's C function, with the suffix
_internal
. Thus, scm_environment_ref_internal
is almost
identical to scm_environment_ref
, except that it indicates an
unbound variable in a manner friendlier to C code.
In Guile, an environment is a mapping from symbols onto variables, and a variable is a location containing a value. Guile uses the datatype described here to represent its top-level environments.
Guile distinguishes between environments and modules. A module is a
unit of code sharing; it has a name, like (math random)
, an
implementation (e.g., Scheme source code, a dynamically linked library,
or a set of primitives built into Guile), and finally, an environment
containing the definitions which the module exports for its users.
An environment, by contrast, is simply an abstract data type
representing a mapping from symbols onto variables which the Guile
interpreter uses to look up top-level definitions. The eval
procedure interprets its first argument, an expression, in the context
of its second argument, an environment.
Guile uses environments to implement its module system. A module created by loading Scheme code might be built from several environments. In addition to the environment of exported definitions, such a module might have an internal top-level environment, containing both exported and private definitions, and perhaps environments for imported definitions alone and local definitions alone.
The interface described here includes a full set of functions for mutating environments, and the system goes to some length to maintain its consistency as environments' bindings change. This is necessary because Guile is an interactive system. The user may create new definitions or modify and reload modules while Guile is running; the system should handle these changes in a consistent and predictable way.
A typical Guile system will have several distinct top-level environments. (This is why we call them "top-level", and not "global".) For example, consider the following fragment of an interactive Guile session:
guile> (use-modules (ice-9 regex)) guile> (define pattern "^(..+)\\1+$") guile> (string-match pattern "xxxx") #("xxxx" (0 . 4) (0 . 2)) guile> (string-match pattern "xxxxx") #f guile>
Guile evaluates the expressions the user types in a top-level
environment reserved for that purpose; the definition of pattern
goes there. That environment is distinct from the one holding the
private definitions of the (ice-9 regex)
module. At the Guile
prompt, the user does not see the module's private definitions, and the
module is unaffected by definitions the user makes at the prompt. The
use-modules
form copies the module's public bindings into the
user's environment.
All Scheme evaluation takes place with respect to some top-level
environment. Just as the procedure created by a lambda
form
closes over any local scopes surrounding that form, it also closes over
the surrounding top-level environment. Thus, since the
string-match
procedure is defined in the (ice-9 regex)
module, it closes over that module's top-level environment. Thus, when
the user calls string-match
from the Guile prompt, any free
variables in string-match
's definition are resolved with respect
to the module's top-level environment, not the user's.
Although the Guile interaction loop maintains a "current" top-level environment in which it evaluates the user's input, it would be misleading to extend the concept of a "current top-level environment" to the system as a whole. Each procedure closes over its own top-level environment, in which that procedure will find bindings for its free variables. Thus, the top-level environment in force at any given time depends on the procedure Guile happens to be executing. The global "current" environment is a figment of the interaction loop's imagination.
Since environments provide all the operations the Guile interpreter needs to evaluate code, they effectively insulate the interpreter from the details of the module system. Without changing the interpreter, you can implement any module system you like, as long as its efforts produce an environment object the interpreter can consult.
Finally, environments may prove a convenient way for Guile to access the features of other systems. For example, one might export the The GIMP's Procedural Database to Guile as a custom environment type; this environment could create Scheme procedure objects corresponding to GIMP procedures, as the user referenced them.
This section describes the common set of operations that all environment objects support. To create an environment object, or to perform an operation specific to a particular kind of environment, see section Standard Environment Types.
In this section, the following names for formal parameters imply that the actual parameters must have a certain type:
#t
if object is an environment, or #f
otherwise.
environment:unbound
error (see section Environment Errors).
#t
if symbol is bound in env, or #f
otherwise.
For each binding in env, apply proc to the symbol bound, its value, and the result from the previous application of proc. Use init as proc's third argument the first time proc is applied.
If env contains no bindings, this function simply returns init.
If env binds the symbol sym1 to the value val1, sym2 to val2, and so on, then this procedure computes:
(proc sym1 val1 (proc sym2 val2 ... (proc symn valn init)))
Each binding in env will be processed exactly once.
environment-fold
makes no guarantees about the order in which the
bindings are processed.
Here is a function which, given an environment, constructs an
association list representing that environment's bindings, using
environment-fold
:
(define (environment->alist env) (environment-fold env (lambda (sym val tail) (cons (cons sym val) tail)) '()))
environment-ref
, except that if
symbol is unbound in env, it returns the value
SCM_UNDEFINED
, instead of signalling an error.
environment-fold
. For each binding in
env, make the call:
(*proc) (data, symbol, value, previous)
where previous is the value returned from the last call to
*proc
, or init for the first call. If env
contains no bindings, return init.
scm_environment_fold_internal
.
Here are functions for changing symbols' bindings and values.
Although it is common to say that an environment binds a symbol to a value, this is not quite accurate; an environment binds a symbol to a location, and the location contains a value. In the descriptions below, we will try to make clear how each function affects bindings and locations.
Note that some environments may contain some immutable bindings, or may
bind symbols to immutable locations. If you attempt to change an
immutable binding or value, these functions will signal an
environment:immutable-binding
or
environment:immutable-location
error. However, simply because a
binding cannot be changed via these functions does not imply that
it is constant. Mechanisms outside the scope of this section (say,
re-loading a module's source code) may change a binding or value which
is immutable via these functions.
If symbol is already bound in env, and the binding is
immutable, signal an environment:immutable-binding
error.
If symbol is already bound in env, and the binding is
immutable, signal an environment:immutable-binding
error.
If symbol is not bound in env, signal an
environment:unbound
error. If env binds symbol to an
immutable location, signal an environment:immutable-location
error.
Some applications refer to variables' values so frequently that the
overhead of environment-ref
and environment-set!
is
unacceptable. For example, variable reference speed is a critical
factor in the performance of the Guile interpreter itself. If an
application can tolerate some additional complexity, the
environment-cell
function described here can provide very
efficient access to variable values.
In the Guile interpreter, most variables are represented by pairs; the
CDR of the pair holds the variable's value. Thus, a variable
reference corresponds to taking the CDR of one of these pairs, and
setting a variable corresponds to a set-cdr!
operation. A pair
used to represent a variable's value in this manner is called a
value cell. Value cells represent the "locations" to which
environments bind symbols.
The environment-cell
function returns the value cell bound to a
symbol. For example, an interpreter might make the call
(environment-cell env symbol #t)
to find the value
cell which env binds to symbol, and then use cdr
and
set-cdr!
to reference and assign to that variable, instead of
calling environment-ref
or environment-set! for each
variable reference.
There are a few caveats that apply here:
#f
in response to a
request for a symbol's value cell; in this case, the caller must use
environment-ref
and environment-set!
to manipulate the
variable.
environment-cell
to obtain the variable's value cell, it no
longer needs to use environment-ref
and environment-set!
to access the variable, and it may not see the new binding.
Thus, code which uses environment-cell
should almost always use
environment-observe
to track changes to the symbol's binding;
this is the additional complexity hinted at above. See section Observing Changes to Environments.
environment-cell
to obtain the value cell of such a variable,
then it is impossible for the environment to prevent the program from
changing the variable's value, using set-cdr!
. However, this is
discouraged; it is probably better to redesign the interface than to
disregard such a request. To make it easy for programs to honor the
immutability of a variable, environment-cell
takes an argument
indicating whether the caller intends to mutate the cell's value; if
this argument is true, then environment-cell
signals an
environment:immutable-location
error.
Programs should therefore make separate calls to environment-cell
to obtain value cells for reference and for assignment. It is incorrect
for a program to call environment-cell
once to obtain a value
cell, and then use that cell for both reference and mutation.
#f
if the binding does not live in a value cell.
The argument for-write indicates whether the caller intends to
modify the variable's value by mutating the value cell. If the variable
is immutable, then environment-cell
signals an
environment:immutable-location
error.
If symbol is unbound in env, signal an environment:unbound
error.
If you use this function, you should consider using
environment-observe
, to be notified when symbol
gets
re-bound to a new value cell, or becomes undefined.
environment-cell
, except that if
symbol is unbound in env, it returns the value
SCM_UNDEFINED
, instead of signalling an error.
[[After we have some experience using this, we may find that we want to be able to explicitly ask questions like, "Is this variable mutable?" without the annoyance of error handling. But maybe this is fine.]]
The procedures described here allow you to add and remove observing procedures for an environment.
A program may register an observing procedure for an environment,
which will be called whenever a binding in a particular environment
changes. For example, if the user changes a module's source code and
re-loads the module, other parts of the system may want to throw away
information they have cached about the bindings of the older version of
the module. To support this, each environment retains a set of
observing procedures which it will invoke whenever its bindings change.
We say that these procedures observe the environment's bindings.
You can register new observing procedures for an environment using
environment-observe
.
This function returns an object, token, which you can pass to
environment-unobserve
to remove proc from the set of
procedures observing env. The type and value of token is
unspecified.
If a call (environment-observe env proc)
returns
token, then the call (environment-unobserve token)
will cause proc to no longer be called when env's bindings
change.
There are some limitations on observation:
When writing observing procedures, pay close attention to garbage
collection issues. If you use environment-observe
to register
observing procedures for an environment, the environment will hold a
reference to those procedures; while that environment is alive, its
observing procedures will live, as will any data they close over. If
this is not appropriate, you can use the environment-observe-weak
procedure to create a weak reference from the environment to the
observing procedure.
For example, suppose an interpreter uses environment-cell
to
reference variables efficiently, as described above in section Caching Environment Lookups. That interpreter must register observing
procedures to track changes to the environment. If those procedures
retain any reference to the data structure representing the program
being interpreted, then that structure cannot be collected as long as
the observed environment lives. This is almost certainly incorrect ---
if there are no other references to the structure, it can never be
invoked, so it should be collected. In this case, the interpreter
should register its observing procedure using
environment-observe-weak
, and retain a pointer to it from the
code it updates. Thus, when the code is no longer referenced elsewhere
in the system, the weak link will be broken, and Guile will collect the
code (and its observing procedure).
environment-observe
, except that the
reference env retains to proc is a weak reference. This
means that, if there are no other live, non-weak references to
proc, it will be garbage-collected, and dropped from env's
list of observing procedures.
It is also possible to write code that observes an environment in C.
The scm_environment_observe_internal
function registers a C
function to observe an environment. The typedef
scm_environment_observer
is the type a C observer function must
have.
environment-observe
. Whenever env's bindings change, call
the function proc, passing it env and data. If
weak_p is non-zero, env will retain only a weak reference to
data, and if data is garbage collected, the entire
observation will be dropped.
This function returns a token, with the same meaning as those returned
by environment-observe
.
scm_environment_internal_observe
should have the type
scm_environment_observer
.
Note that, like all other primitives, environment-observe
is also
available from C, under the name scm_environment_observe
.
Here are the error conditions signalled by the environment routines described above. In these conditions, func is a string naming a particular procedure.
Guile supports several different kinds of environments. The operations described above are actually only the common functionality provided by all the members of a family of environment types, each designed for a separate purpose.
Each environment type has a constructor procedure for building elements of that type, and extends the set of common operations with its own procedures, providing specialized functions. For an example of how these environment types work together, see section Modules of Interpreted Scheme Code.
Guile allows users to define their own environment types. Given a set of procedures that implement the common environment operations, Guile will construct a new environment object based on those procedures.
A finite environment is simply a mutable set of definitions. A mutable environment supports no operations beyond the common set.
#t
if object is a finite environment, or #f
otherwise.
In Guile, each module of interpreted Scheme code uses a finite environment to hold the definitions made in that module.
A module's source code refers to definitions imported from other modules, and definitions made within itself. An eval environment combines two environments -- a local environment and an imported environment -- to produce a new environment in which both sorts of references can be resolved.
Applying environment-define
or environment-undefine
to
eval has the same effect as applying the procedure to local.
Note that eval incorporates local and imported by reference -- if, after creating eval, the program changes the bindings of local or imported, those changes will be visible in eval.
Since most Scheme evaluation takes place in eval environments, they transparenty cache the bindings received from local and imported. Thus, the first time the program looks up a symbol in eval, eval may make calls to local or imported to find their bindings, but subsequent references to that symbol will be as fast as references to bindings in finite environments.
In typical use, local will be a finite environment, and imported will be an import environment, described below.
#t
if object is an eval environment, or #f
otherwise.
An import environment combines the bindings of a set of argument environments, and checks for naming clashes.
If two different elements of imports have a binding for the same symbol, apply conflict-proc to the two environments. If the bindings of any of the imports ever changes, check for conflicts again.
All bindings in imp are immutable. If you apply
environment-define
or environment-undefine
to imp,
Guile will signal an environment:immutable-binding
error.
However, notice that the set of bindings in imp may still change,
if one of its imported environments changes.
#t
if object is an import environment, or #f
otherwise.
I'm not at all sure about the way conflict-proc works. I think module systems should warn you if it seems you're likely to get the wrong binding, but exactly how and when those warnings should be generated, I don't know.
An export environment restricts an environment a specified set of bindings.
The environment exp binds symbol to location when env does, and symbol is exported by signature.
Signature is a list specifying which of the bindings in private should be visible in exp. Each element of signature should be a list of the form:
(symbol attribute ...)
where each attribute is one of the following:
mutable-location
environment-cell
directly through to private.
immutable-location
environment-set!
to exp and
symbol, or calls environment-cell
to obtain a writable
value cell, environment-set!
will signal an
environment:immutable-location
error.
Note that, even if an export environment treats a location as immutable,
the underlying environment may treat it as mutable, so its value may
change.
It is an error for an element of signature to specify both
mutable-location
and immutable-location
. If neither is
specified, immutable-location
is assumed.
As a special case, if an element of signature is a lone symbol
sym, it is equivalent to an element of the form
(sym)
.
All bindings in exp are immutable. If you apply
environment-define
or environment-undefine
to exp,
Guile will signal an environment:immutable-binding
error.
However, notice that the set of bindings in exp may still change,
if the bindings in private change.
#t
if object is an export environment, or #f
otherwise.
[[user provides the procedures]] [[A observers B and C; B observes C; C changes; A should only be notified once, right?]] [[observation loops?]]
This section describes how to implement new environment types in Guile.
Guile's internal representation of environments allows you to extend
Guile with new kinds of environments without modifying Guile itself.
Every environment object carries a pointer to a structure of pointers to
functions implementing the common operations for that environment. The
procedures environment-ref
, environment-set!
, etc. simply
find this structure and invoke the appropriate function.
[[It would be nice to have an example around here. How about a persistent environment, bound to a directory, where ref and set actually access files? Ref on a directory would return another environment... Hey, let's import my home directory!]]
An environment object is a smob whose CDR is a pointer to a pointer
to a struct environment_funcs
:
struct environment_funcs { SCM (*ref) (SCM self, SCM symbol); SCM (*fold) (SCM self, scm_environment_folder *proc, SCM data, SCM init); void (*define) (SCM self, SCM symbol, SCM value); void (*undefine) (SCM self, SCM symbol); void (*set) (SCM self, SCM symbol, SCM value); SCM (*cell) (SCM self, SCM symbol, int for_write); SCM (*observe) (SCM self, scm_environment_observer *proc, SCM data, int weak_p); void (*unobserve) (SCM self, SCM token); SCM (*mark) (SCM self); scm_sizet (*free) (SCM self); int (*print) (SCM self, SCM port, scm_print_state *pstate); };
You can use the following macro to access an environment's function table:
struct environment_func
for the environment
env. If env is not an environment object, the behavior of
this macro is undefined.
Here is what each element of env_funcs must do to correctly implement an environment. In all of these calls, self is the environment whose function is being invoked.
SCM ref (SCM self, SCM symbol);
scm_environment_ref_internal (self, symbol)See section Examining Environments. Note that the
ref
element of a struct environment_funcs
may be zero if a cell
function is provided.
SCM fold (SCM self, scm_environment_folder *proc, SCM data, SCM init);
scm_environment_fold_internal (self, proc, data, init)See section Examining Environments.
void define (SCM self, SCM symbol, SCM value);
(environment-define self symbol value)See section Changing Environments.
void undefine (SCM self, SCM symbol);
(environment-undefine self symbol)See section Changing Environments.
void set (SCM self, SCM symbol, SCM value);
(environment-set! self symbol value)See section Changing Environments. Note that the
set
element of a struct environment_funcs
may be zero if a cell
function is provided.
SCM cell (SCM self, SCM symbol, int for_write);
scm_environment_cell_internal (self, symbol)See section Caching Environment Lookups.
SCM observe (SCM self, scm_environment_observer *proc, SCM data, int weak_p);
scm_environment_observe_internal (env, proc, data, weak_p)See section Observing Changes to Environments.
void unobserve (SCM self, SCM token);
SCM mark (SCM self);
scm_sizet free (SCM self);
scm_must_malloc
or
scm_must_realloc
.
SCM print (SCM self, SCM port, scm_print_state *pstate);
When you implement a new environment type, you will likely want to
associate some data of your own design with each environment object.
Since ANSI C promises that casts will safely convert between a pointer
to a structure and a pointer to its first element, you can have the
CDR of an environment smob point to your structure, as long as your
structure's first element is a pointer to a struct
environment_funcs
. Then, your code can use the macro below to retrieve
a pointer to the structure, and cast it to the appropriate type.
environment_funcs
structure.
[[perhaps a simple environment based on association lists]]
Here's what we'd need to do to today's Guile to install the system described above. This work would probably be done on a branch, because it involves crippling Guile while a lot of work gets done. Also, it could change the default set of bindings available pretty drastically, so the next minor release should not contain these changes.
After each step here, we should have a Guile that we can at least interact with, perhaps with some limitations.
environment-cell
and
environment-observe
instead of the symbol value slots,
first-class variables, etc. Modify the rest of libguile as necessary to
register all the primitives in a single environment. We'll segregate
them into modules later.
Once this is done, we can make the following simplifications to Guile:
scm_sym2ovcell
,
scm_intern_obarray_soft
, etc. can go away. intern
becomes
simpler.
The material here is just a sketch. Don't take it too seriously. The point is that environments allow us to experiment without getting tangled up with the interpreter.
If a module is implemented by interpreted Scheme code, Guile represents it using several environments:
Each of these environments is implemented using a separate environment
type. Some of these types, like the evaluation and import environments,
actually just compute their bindings by consulting other environments;
they have no bindings in their own right. They implement operations
like environment-ref
and environment-define
by passing
them through to the environments from which they are derived. For
example, the evaluation environment will pass definitions through to the
local environment, and search for references and assignments first in
the local environment, and then in the import environment.
This document was generated on 21 April 1999 using the texi2html translator version 1.51.