comment various region-related things better

This commit is contained in:
Niko Matsakis 2012-07-24 08:12:15 -07:00
parent 7022ede9b3
commit 2d3a197f0e
2 changed files with 60 additions and 148 deletions

View File

@ -1,134 +1,9 @@
Region resolution. This pass runs before typechecking and resolves region
names to the appropriate block.
This seems to be as good a place as any to explain in detail how
region naming, representation, and type check works.
### Naming and so forth
We really want regions to be very lightweight to use. Therefore,
unlike other named things, the scopes for regions are not explicitly
declared: instead, they are implicitly defined. Functions declare new
scopes: if the function is not a bare function, then as always it
inherits the names in scope from the outer scope. Within a function
declaration, new names implicitly declare new region variables. Outside
of function declarations, new names are illegal. To make this more
concrete, here is an example:
fn foo(s: &a.S, t: &b.T) {
let s1: &a.S = s; // a refers to the same a as in the decl
let t1: &c.T = t; // illegal: cannot introduce new name here
The code in this file is what actually handles resolving these names.
It creates a couple of maps that map from the AST node representing a
region ptr type to the resolved form of its region parameter. If new
names are introduced where they shouldn't be, then an error is
If regions are not given an explicit name, then the behavior depends
a bit on the context. Within a function declaration, all unnamed regions
are mapped to a single, anonymous parameter. That is, a function like:
fn foo(s: &S) -> &S { s }
is equivalent to a declaration like:
fn foo(s: &a.S) -> &a.S { s }
Within a function body or other non-binding context, an unnamed region
reference is mapped to a fresh region variable whose value can be
inferred as normal.
The resolved form of regions is `ty::region`. Before I can explain
why this type is setup the way it is, I have to digress a little bit
into some ill-explained type theory.
### Universal Quantification
Regions are more complex than type parameters because, unlike type
parameters, they can be universally quantified within a type. To put
it another way, you cannot (at least at the time of this writing) have
a variable `x` of type `fn<T>(T) -> T`. You can have an *item* of
type `fn<T>(T) -> T`, but whenever it is referenced within a method,
that type parameter `T` is replaced with a concrete type *variable*
`$T`. To make this more concrete, imagine this code:
fn identity<T>(x: T) -> T { x }
let f = identity; // f has type fn($T) -> $T
f(3u); // $T is bound to uint
f(3); // Type error
You can see here that a type error will result because the type of `f`
(as opposed to the type of `identity`) is not universally quantified
over `$T`. That's fancy math speak for saying that the type variable
`$T` refers to a specific type that may not yet be known, unlike the
type parameter `T` which refers to some type which will never be
Anyway, regions work differently. If you have an item of type
`fn(&a.T) -> &a.T` and you reference it, its type remains the same:
only when the function *is called* is `&a` instantiated with a
concrete region variable. This means you could call it twice and give
different values for `&a` each time.
This more general form is possible for regions because they do not
impact code generation. We do not need to monomorphize functions
differently just because they contain region pointers. In fact, we
don't really do *anything* differently.
### Representing regions; or, why do I care about all that?
The point of this discussion is that the representation of regions
must distinguish between a *bound* reference to a region and a *free*
reference. A bound reference is one which will be replaced with a
fresh type variable when the function is called, like the type
parameter `T` in `identity`. They can only appear within function
types. A free reference is a region that may not yet be concretely
known, like the variable `$T`.
To see why we must distinguish them carefully, consider this program:
fn item1(s: &a.S) {
let choose = fn@(s1: &a.S) -> &a.S {
if some_cond { s } else { s1 }
Here, the variable `s1: &a.S` that appears within the `fn@` is a free
reference to `a`. That is, when you call `choose()`, you don't
replace `&a` with a fresh region variable, but rather you expect `s1`
to be in the same region as the parameter `s`.
But in this program, this is not the case at all:
fn item2() {
let identity = fn@(s1: &a.S) -> &a.S { s1 };
To distinguish between these two cases, `ty::region` contains two
variants: `re_bound` and `re_free`. In `item1()`, the outer reference
to `&a` would be `re_bound(rid_param("a", 0u))`, and the inner reference
would be `re_free(rid_param("a", 0u))`. In `item2()`, the inner reference
would be `re_bound(rid_param("a", 0u))`.
#### Implications for typeck
In typeck, whenever we call a function, we must go over and replace
all references to `re_bound()` regions within its parameters with
fresh type variables (we do not, however, replace bound regions within
nested function types, as those nested functions have not yet been
Also, when we typecheck the *body* of an item, we must replace all
`re_bound` references with `re_free` references. This means that the
region in the type of the argument `s` in `item1()` *within `item1()`*
is not `re_bound(re_param("a", 0u))` but rather `re_free(re_param("a",
0u))`. This is because, for any particular *invocation of `item1()`*,
`&a` will be bound to some specific region, and hence it is no longer
This file actually contains two passes related to regions. The first
pass builds up the `region_map`, which describes the parent links in
the region hierarchy. The second pass infers which types must be
region parameterized.
@ -153,10 +28,10 @@ type binding = {node_id: ast::node_id,
name: ~str,
br: ty::bound_region};
// Mapping from a block/expr/binding to the innermost scope that
// bounds its lifetime. For a block/expression, this is the lifetime
// in which it will be evaluated. For a binding, this is the lifetime
// in which is in scope.
/// Mapping from a block/expr/binding to the innermost scope that
/// bounds its lifetime. For a block/expression, this is the lifetime
/// in which it will be evaluated. For a binding, this is the lifetime
/// in which is in scope.
type region_map = hashmap<ast::node_id, ast::node_id>;
type ctxt = {
@ -198,8 +73,8 @@ type ctxt = {
parent: parent
// Returns true if `subscope` is equal to or is lexically nested inside
// `superscope` and false otherwise.
/// Returns true if `subscope` is equal to or is lexically nested inside
/// `superscope` and false otherwise.
fn scope_contains(region_map: region_map, superscope: ast::node_id,
subscope: ast::node_id) -> bool {
let mut subscope = subscope;
@ -212,6 +87,9 @@ fn scope_contains(region_map: region_map, superscope: ast::node_id,
ret true;
/// Finds the nearest common ancestor (if any) of two scopes. That
/// is, finds the smallest scope which is greater than or equal to
/// both `scope_a` and `scope_b`.
fn nearest_common_ancestor(region_map: region_map, scope_a: ast::node_id,
scope_b: ast::node_id) -> option<ast::node_id> {
@ -262,6 +140,7 @@ fn nearest_common_ancestor(region_map: region_map, scope_a: ast::node_id,
/// Extracts that current parent from cx, failing if there is none.
fn parent_id(cx: ctxt, span: span) -> ast::node_id {
alt cx.parent {
none {
@ -273,6 +152,7 @@ fn parent_id(cx: ctxt, span: span) -> ast::node_id {
/// Records the current parent (if any) as the parent of `child_id`.
fn record_parent(cx: ctxt, child_id: ast::node_id) {
alt cx.parent {
none { /* no-op */ }

View File

@ -307,6 +307,13 @@ enum closure_kind {
/// Innards of a function type:
/// - `purity` is the function's effect (pure, impure, unsafe).
/// - `proto` is the protocol (fn@, fn~, etc).
/// - `inputs` is the list of arguments and their modes.
/// - `output` is the return type.
/// - `ret_style`indicates whether the function returns a value or fails.
type fn_ty = {purity: ast::purity,
proto: ast::proto,
inputs: ~[arg],
@ -315,13 +322,32 @@ type fn_ty = {purity: ast::purity,
type param_ty = {idx: uint, def_id: def_id};
// See discussion at head of
/// Representation of regions:
enum region {
/// Bound regions are found (primarily) in function types. They indicate
/// region parameters that have yet to be replaced with actual regions
/// (analogous to type parameters, except that due to the monomorphic
/// nature of our type system, bound type parameters are always replaced
/// with fresh type variables whenever an item is referenced, so type
/// parameters only appear "free" in types. Regions in contrast can
/// appear free or bound.). When a function is called, all bound regions
/// tied to that function's node-id are replaced with fresh region
/// variables whose value is then inferred.
/// When checking a function body, the types of all arguments and so forth
/// that refer to bound region parameters are modified to refer to free
/// region parameters.
re_free(node_id, bound_region),
/// A concrete region naming some expression within the current function.
/// Static data that has an "infinite" lifetime.
/// A region variable. Should not exist after typeck.
re_static // effectively `top` in the region lattice
enum bound_region {
@ -332,16 +358,22 @@ enum bound_region {
type opt_region = option<region>;
// The type substs represents the kinds of things that can be substituted into
// a type. There may be at most one region parameter (self_r), along with
// some number of type parameters (tps).
// The region parameter is present on nominative types (enums, resources,
// classes) that are declared as having a region parameter. If the type is
// declared as `enum foo&`, then self_r should always be non-none. If the
// type is declared as `enum foo`, then self_r will always be none. In the
// latter case, typeck::ast_ty_to_ty() will reject any references to `&T` or
// `&self.T` within the type and report an error.
/// The type substs represents the kinds of things that can be substituted to
/// convert a polytype into a monotype. Note however that substituting bound
/// regions other than `self` is done through a different mechanism.
/// `tps` represents the type parameters in scope. They are indexed according
/// to the order in which they were declared.
/// `self_r` indicates the region parameter `self` that is present on nominal
/// types (enums, classes) declared as having a region parameter. `self_r`
/// should always be none for types that are not region-parameterized and
/// some(_) for types that are. The only bound region parameter that should
/// appear within a region-parameterized type is `self`.
/// `self_ty` is the type to which `self` should be remapped, if any. The
/// `self` type is rather funny in that it can only appear on interfaces and
/// is always substituted away to the implementing type for an interface.
type substs = {
self_r: opt_region,
self_ty: option<ty::t>,