Implement the new region hierarchy rules, in which regions from distinct

hierarchies are judged based on the lexical relationship of their
respective fn bodies.
This commit is contained in:
Niko Matsakis 2015-03-05 06:24:22 -05:00
parent bc959fdb28
commit a53a22eab9
3 changed files with 105 additions and 135 deletions

View File

@ -249,114 +249,61 @@ there is a reference created whose lifetime does not enclose
the borrow expression, we must issue sufficient restrictions to ensure
that the pointee remains valid.
## Adding closures
## Modeling closures
The other significant complication to the region hierarchy is
closures. I will describe here how closures should work, though some
of the work to implement this model is ongoing at the time of this
writing.
Integrating closures properly into the model is a bit of
work-in-progress. In an ideal world, we would model closures as
closely as possible after their desugared equivalents. That is, a
closure type would be modeled as a struct, and the region hierarchy of
different closure bodies would be completely distinct from all other
fns. We are generally moving in that direction but there are
complications in terms of the implementation.
The body of closures are type-checked along with the function that
creates them. However, unlike other expressions that appear within the
function body, it is not entirely obvious when a closure body executes
with respect to the other expressions. This is because the closure
body will execute whenever the closure is called; however, we can
never know precisely when the closure will be called, especially
without some sort of alias analysis.
In practice what we currently do is somewhat different. The basis for
the current approach is the observation that the only time that
regions from distinct fn bodies interact with one another is through
an upvar or the type of a fn parameter (since closures live in the fn
body namespace, they can in fact have fn parameters whose types
include regions from the surrounding fn body). For these cases, there
are separate mechanisms which ensure that the regions that appear in
upvars/parameters outlive the dynamic extent of each call to the
closure:
However, we can place some sort of limits on when the closure
executes. In particular, the type of every closure `fn:'r K` includes
a region bound `'r`. This bound indicates the maximum lifetime of that
closure; once we exit that region, the closure cannot be called
anymore. Therefore, we say that the lifetime of the closure body is a
sublifetime of the closure bound, but the closure body itself is unordered
with respect to other parts of the code.
1. Types must outlive the region of any expression where they are used.
For a closure type `C` to outlive a region `'r`, that implies that the
types of all its upvars must outlive `'r`.
2. Parameters must outlive the region of any fn that they are passed to.
For example, consider the following fragment of code:
Therefore, we can -- sort of -- assume that when we are asked to
compare a region `'a` from a closure with a region `'b` from the fn
that encloses it, in fact `'b` is the larger region. And that is
precisely what we do: when building the region hierarchy, each region
lives in its own distinct subtree, but if we are asked to compute the
`LUB(r1, r2)` of two regions, and those regions are in disjoint
subtrees, we compare the lexical nesting of the two regions.
'a: {
let closure: fn:'a() = || 'b: {
'c: ...
};
'd: ...
}
Here we have four lifetimes, `'a`, `'b`, `'c`, and `'d`. The closure
`closure` is bounded by the lifetime `'a`. The lifetime `'b` is the
lifetime of the closure body, and `'c` is some statement within the
closure body. Finally, `'d` is a statement within the outer block that
created the closure.
We can say that the closure body `'b` is a sublifetime of `'a` due to
the closure bound. By the usual lexical scoping conventions, the
statement `'c` is clearly a sublifetime of `'b`, and `'d` is a
sublifetime of `'d`. However, there is no ordering between `'c` and
`'d` per se (this kind of ordering between statements is actually only
an issue for dataflow; passes like the borrow checker must assume that
closures could execute at any time from the moment they are created
until they go out of scope).
### Complications due to closure bound inference
There is only one problem with the above model: in general, we do not
actually *know* the closure bounds during region inference! In fact,
closure bounds are almost always region variables! This is very tricky
because the inference system implicitly assumes that we can do things
like compute the LUB of two scoped lifetimes without needing to know
the values of any variables.
Here is an example to illustrate the problem:
fn identify<T>(x: T) -> T { x }
fn foo() { // 'foo is the function body
'a: {
let closure = identity(|| 'b: {
'c: ...
});
'd: closure();
}
'e: ...;
}
In this example, the closure bound is not explicit. At compile time,
we will create a region variable (let's call it `V0`) to represent the
closure bound.
The primary difficulty arises during the constraint propagation phase.
Imagine there is some variable with incoming edges from `'c` and `'d`.
This means that the value of the variable must be `LUB('c,
'd)`. However, without knowing what the closure bound `V0` is, we
can't compute the LUB of `'c` and `'d`! Any we don't know the closure
bound until inference is done.
The solution is to rely on the fixed point nature of inference.
Basically, when we must compute `LUB('c, 'd)`, we just use the current
value for `V0` as the closure's bound. If `V0`'s binding should
change, then we will do another round of inference, and the result of
`LUB('c, 'd)` will change.
One minor implication of this is that the graph does not in fact track
the full set of dependencies between edges. We cannot easily know
whether the result of a LUB computation will change, since there may
be indirect dependencies on other variables that are not reflected on
the graph. Therefore, we must *always* iterate over all edges when
doing the fixed point calculation, not just those adjacent to nodes
whose values have changed.
Were it not for this requirement, we could in fact avoid fixed-point
iteration altogether. In that universe, we could instead first
identify and remove strongly connected components (SCC) in the graph.
Note that such components must consist solely of region variables; all
of these variables can effectively be unified into a single variable.
Once SCCs are removed, we are left with a DAG. At this point, we
could walk the DAG in topological order once to compute the expanding
nodes, and again in reverse topological order to compute the
contracting nodes. However, as I said, this does not work given the
current treatment of closure bounds, but perhaps in the future we can
address this problem somehow and make region inference somewhat more
efficient. Note that this is solely a matter of performance, not
expressiveness.
*Ideas for improving the situation:* The correct argument here is
subtle and a bit hand-wavy. The ideal, as stated earlier, would be to
model things in such a way that it corresponds more closely to the
desugared code. The best approach for doing this is a bit unclear: it
may in fact be possible to *actually* desugar before we start, but I
don't think so. The main option that I've been thinking through is
imposing a "view shift" as we enter the fn body, so that regions
appearing in the types of fn parameters and upvars are translated from
being regions in the outer fn into free region parameters, just as
they would be if we applied the desugaring. The challenge here is that
type inference may not have fully run, so the types may not be fully
known: we could probably do this translation lazilly, as type
variables are instantiated. We would also have to apply a kind of
inverse translation to the return value. This would be a good idea
anyway, as right now it is possible for free regions instantiated
within the closure to leak into the parent: this currently leads to
type errors, since those regions cannot outlive any expressions within
the parent hierarchy. Much like the current handling of closures,
there are no known cases where this leads to a type-checking accepting
incorrect code (though it sometimes rejects what might be considered
correct code; see rust-lang/rust#22557), but it still doesn't feel
like the right approach.
### Skolemization

View File

@ -262,7 +262,9 @@ pub struct RegionMaps {
/// fn, if any. Thus the map structures the fn bodies into a
/// hierarchy based on their lexical mapping. This is used to
/// handle the relationships between regions in a fn and in a
/// closure defined by that fn.
/// closure defined by that fn. See the "Modeling closures"
/// section of the README in middle::infer::region_inference for
/// more details.
fn_tree: RefCell<NodeMap<ast::NodeId>>,
}
@ -337,7 +339,9 @@ pub struct Context {
/// the root of the current region tree. This is typically the id
/// of the innermost fn body. Each fn forms its own disjoint tree
/// in the region hierarchy. These fn bodies are themselves
/// arranged into a tree.
/// arranged into a tree. See the "Modeling closures" section of
/// the README in middle::infer::region_inference for more
/// details.
root_id: Option<ast::NodeId>,
/// the scope that contains any new variables declared
@ -411,7 +415,18 @@ fn record_fn_parent(&self, sub_fn: ast::NodeId, sup_fn: ast::NodeId) {
assert!(previous.is_none());
}
fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
fn fn_is_enclosed_by(&self, mut sub_fn: ast::NodeId, sup_fn: ast::NodeId) -> bool {
let fn_tree = self.fn_tree.borrow();
loop {
if sub_fn == sup_fn { return true; }
match fn_tree.get(&sub_fn) {
Some(&s) => { sub_fn = s; }
None => { return false; }
}
}
}
pub fn record_encl_scope(&self, sub: CodeExtent, sup: CodeExtent) {
debug!("record_encl_scope(sub={:?}, sup={:?})", sub, sup);
assert!(sub != sup);
self.scope_map.borrow_mut().insert(sub, sup);
@ -600,7 +615,7 @@ pub fn nearest_common_ancestor(&self,
let mut a_index = a_ancestors.len() - 1;
let mut b_index = b_ancestors.len() - 1;
// Here, ~[ab]_ancestors is a vector going from narrow to broad.
// Here, [ab]_ancestors is a vector going from narrow to broad.
// The end of each vector will be the item where the scope is
// defined; if there are any common ancestors, then the tails of
// the vector will be the same. So basically we want to walk
@ -609,7 +624,32 @@ pub fn nearest_common_ancestor(&self,
// then the corresponding scope is a superscope of the other.
if a_ancestors[a_index] != b_ancestors[b_index] {
return None;
// In this case, the two regions belong to completely
// different functions. Compare those fn for lexical
// nesting. The reasoning behind this is subtle. See the
// "Modeling closures" section of the README in
// middle::infer::region_inference for more details.
let a_root_scope = a_ancestors[a_index];
let b_root_scope = a_ancestors[a_index];
return match (a_root_scope, b_root_scope) {
(CodeExtent::DestructionScope(a_root_id),
CodeExtent::DestructionScope(b_root_id)) => {
if self.fn_is_enclosed_by(a_root_id, b_root_id) {
// `a` is enclosed by `b`, hence `b` is the ancestor of everything in `a`
Some(scope_b)
} else if self.fn_is_enclosed_by(b_root_id, a_root_id) {
// `b` is enclosed by `a`, hence `a` is the ancestor of everything in `b`
Some(scope_a)
} else {
// neither fn encloses the other
None
}
}
_ => {
// root ids are always Misc right now
unreachable!()
}
};
}
loop {
@ -675,6 +715,7 @@ fn resolve_block(visitor: &mut RegionResolutionVisitor, blk: &ast::Block) {
let prev_cx = visitor.cx;
let blk_scope = CodeExtent::Misc(blk.id);
// If block was previously marked as a terminating scope during
// the recursive visit of its parent node in the AST, then we need
// to account for the destruction scope representing the extent of
@ -1144,7 +1185,7 @@ fn resolve_item(visitor: &mut RegionResolutionVisitor, item: &ast::Item) {
}
fn resolve_fn(visitor: &mut RegionResolutionVisitor,
fk: FnKind,
_: FnKind,
decl: &ast::FnDecl,
body: &ast::Block,
sp: Span,
@ -1180,32 +1221,13 @@ fn resolve_fn(visitor: &mut RegionResolutionVisitor,
};
visit::walk_fn_decl(visitor, decl);
// The body of the fn itself is either a root scope (top-level fn)
// or it continues with the inherited scope (closures).
match fk {
visit::FkItemFn(..) | visit::FkMethod(..) => {
visitor.cx = Context {
root_id: Some(body.id),
parent: InnermostEnclosingExpr::None,
var_parent: InnermostDeclaringBlock::None
};
visitor.visit_block(body);
}
visit::FkFnBlock(..) => {
// FIXME(#3696) -- at present we are place the closure body
// within the region hierarchy exactly where it appears lexically.
// This is wrong because the closure may live longer
// than the enclosing expression. We should probably fix this,
// but the correct fix is a bit subtle, and I am also not sure
// that the present approach is unsound -- it may not permit
// any illegal programs. See issue for more details.
visitor.cx = Context {
root_id: Some(body.id),
..outer_cx
};
visitor.visit_block(body);
}
}
// The body of the every fn is a root scope.
visitor.cx = Context {
root_id: Some(body.id),
parent: InnermostEnclosingExpr::None,
var_parent: InnermostDeclaringBlock::None
};
visitor.visit_block(body);
// Restore context we had at the start.
visitor.cx = outer_cx;

View File

@ -588,6 +588,7 @@ fn lub_free_free() {
fn lub_returning_scope() {
test_env(EMPTY_SOURCE_STR,
errors(&["cannot infer an appropriate lifetime"]), |env| {
env.create_simple_region_hierarchy();
let t_rptr_scope10 = env.t_rptr_scope(10);
let t_rptr_scope11 = env.t_rptr_scope(11);