238 lines
11 KiB
Rust
Raw Normal View History

2014-01-15 19:44:38 -05:00
// Copyright 2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
/*!
# Documentation for the trans module
This module contains high-level summaries of how the various modules
in trans work. It is a work in progress. For detailed comments,
naturally, you can refer to the individual modules themselves.
## The Expr module
The expr module handles translation of expressions. The most general
translation routine is `trans()`, which will translate an expression
into a datum. `trans_into()` is also available, which will translate
an expression and write the result directly into memory, sometimes
avoiding the need for a temporary stack slot. Finally,
`trans_to_lvalue()` is available if you'd like to ensure that the
result has cleanup scheduled.
Internally, each of these functions dispatches to various other
expression functions depending on the kind of expression. We divide
up expressions into:
- **Datum expressions:** Those that most naturally yield values.
Examples would be `22`, `box x`, or `a + b` (when not overloaded).
- **DPS expressions:** Those that most naturally write into a location
in memory. Examples would be `foo()` or `Point { x: 3, y: 4 }`.
- **Statement expressions:** That that do not generate a meaningful
result. Examples would be `while { ... }` or `return 44`.
## The Datum module
A `Datum` encapsulates the result of evaluating a Rust expression. It
contains a `ValueRef` indicating the result, a `ty::t` describing
the Rust type, but also a *kind*. The kind indicates whether the datum
has cleanup scheduled (lvalue) or not (rvalue) and -- in the case of
rvalues -- whether or not the value is "by ref" or "by value".
The datum API is designed to try and help you avoid memory errors like
forgetting to arrange cleanup or duplicating a value. The type of the
datum incorporates the kind, and thus reflects whether it has cleanup
scheduled:
- `Datum<Lvalue>` -- by ref, cleanup scheduled
- `Datum<Rvalue>` -- by value or by ref, no cleanup scheduled
- `Datum<Expr>` -- either `Datum<Lvalue>` or `Datum<Rvalue>`
Rvalue and expr datums are noncopyable, and most of the methods on
datums consume the datum itself (with some notable exceptions). This
reflects the fact that datums may represent affine values which ought
to be consumed exactly once, and if you were to try to (for example)
store an affine value multiple times, you would be duplicating it,
which would certainly be a bug.
Some of the datum methods, however, are designed to work only on
copyable values such as ints or pointers. Those methods may borrow the
datum (`&self`) rather than consume it, but they always include
assertions on the type of the value represented to check that this
makes sense. An example is `shallow_copy_and_take()`, which duplicates
a datum value.
Translating an expression always yields a `Datum<Expr>` result, but
the methods `to_[lr]value_datum()` can be used to coerce a
`Datum<Expr>` into a `Datum<Lvalue>` or `Datum<Rvalue>` as
needed. Coercing to an lvalue is fairly common, and generally occurs
whenever it is necessary to inspect a value and pull out its
subcomponents (for example, a match, or indexing expression). Coercing
to an rvalue is more unusual; it occurs when moving values from place
to place, such as in an assignment expression or parameter passing.
### Lvalues in detail
An lvalue datum is one for which cleanup has been scheduled. Lvalue
datums are always located in memory, and thus the `ValueRef` for an
LLVM value is always a pointer to the actual Rust value. This means
that if the Datum has a Rust type of `int`, then the LLVM type of the
`ValueRef` will be `int*` (pointer to int).
Because lvalues already have cleanups scheduled, the memory must be
zeroed to prevent the cleanup from taking place (presuming that the
Rust type needs drop in the first place, otherwise it doesn't
matter). The Datum code automatically performs this zeroing when the
value is stored to a new location, for example.
Lvalues usually result from evaluating lvalue expressions. For
example, evaluating a local variable `x` yields an lvalue, as does a
reference to a field like `x.f` or an index `x[i]`.
Lvalue datums can also arise by *converting* an rvalue into an lvalue.
This is done with the `to_lvalue_datum` method defined on
`Datum<Expr>`. Basically this method just schedules cleanup if the
datum is an rvalue, possibly storing the value into a stack slot first
if needed. Converting rvalues into lvalues occurs in constructs like
`&foo()` or `match foo() { ref x => ... }`, where the user is
implicitly requesting a temporary.
Somewhat surprisingly, not all lvalue expressions yield lvalue datums
when trans'd. Ultimately the reason for this is to micro-optimize
the resulting LLVM. For example, consider the following code:
fn foo() -> Box<int> { ... }
let x = *foo();
The expression `*foo()` is an lvalue, but if you invoke `expr::trans`,
it will return an rvalue datum. See `deref_once` in expr.rs for
more details.
### Rvalues in detail
Rvalues datums are values with no cleanup scheduled. One must be
careful with rvalue datums to ensure that cleanup is properly
arranged, usually by converting to an lvalue datum or by invoking the
`add_clean` method.
### Scratch datums
Sometimes you need some temporary scratch space. The functions
`[lr]value_scratch_datum()` can be used to get temporary stack
space. As their name suggests, they yield lvalues and rvalues
respectively. That is, the slot from `lvalue_scratch_datum` will have
cleanup arranged, and the slot from `rvalue_scratch_datum` does not.
## The Cleanup module
The cleanup module tracks what values need to be cleaned up as scopes
are exited, either via failure or just normal control flow. The basic
idea is that the function context maintains a stack of cleanup scopes
that are pushed/popped as we traverse the AST tree. There is typically
at least one cleanup scope per AST node; some AST nodes may introduce
additional temporary scopes.
Cleanup items can be scheduled into any of the scopes on the stack.
Typically, when a scope is popped, we will also generate the code for
each of its cleanups at that time. This corresponds to a normal exit
from a block (for example, an expression completing evaluation
successfully without failure). However, it is also possible to pop a
block *without* executing its cleanups; this is typically used to
guard intermediate values that must be cleaned up on failure, but not
if everything goes right. See the section on custom scopes below for
more details.
Cleanup scopes come in three kinds:
- **AST scopes:** each AST node in a function body has a corresponding
AST scope. We push the AST scope when we start generate code for an AST
node and pop it once the AST node has been fully generated.
- **Loop scopes:** loops have an additional cleanup scope. Cleanups are
never scheduled into loop scopes; instead, they are used to record the
basic blocks that we should branch to when a `continue` or `break` statement
is encountered.
- **Custom scopes:** custom scopes are typically used to ensure cleanup
of intermediate values.
### When to schedule cleanup
Although the cleanup system is intended to *feel* fairly declarative,
it's still important to time calls to `schedule_clean()` correctly.
Basically, you should not schedule cleanup for memory until it has
been initialized, because if an unwind should occur before the memory
is fully initialized, then the cleanup will run and try to free or
drop uninitialized memory. If the initialization itself produces
byproducts that need to be freed, then you should use temporary custom
scopes to ensure that those byproducts will get freed on unwind. For
example, an expression like `box foo()` will first allocate a box in the
heap and then call `foo()` -- if `foo()` should fail, this box needs
to be *shallowly* freed.
### Long-distance jumps
In addition to popping a scope, which corresponds to normal control
flow exiting the scope, we may also *jump out* of a scope into some
earlier scope on the stack. This can occur in response to a `return`,
`break`, or `continue` statement, but also in response to failure. In
any of these cases, we will generate a series of cleanup blocks for
each of the scopes that is exited. So, if the stack contains scopes A
... Z, and we break out of a loop whose corresponding cleanup scope is
X, we would generate cleanup blocks for the cleanups in X, Y, and Z.
After cleanup is done we would branch to the exit point for scope X.
But if failure should occur, we would generate cleanups for all the
scopes from A to Z and then resume the unwind process afterwards.
To avoid generating tons of code, we cache the cleanup blocks that we
create for breaks, returns, unwinds, and other jumps. Whenever a new
cleanup is scheduled, though, we must clear these cached blocks. A
possible improvement would be to keep the cached blocks but simply
generate a new block which performs the additional cleanup and then
branches to the existing cached blocks.
### AST and loop cleanup scopes
AST cleanup scopes are pushed when we begin and end processing an AST
node. They are used to house cleanups related to rvalue temporary that
get referenced (e.g., due to an expression like `&Foo()`). Whenever an
AST scope is popped, we always trans all the cleanups, adding the cleanup
code after the postdominator of the AST node.
AST nodes that represent breakable loops also push a loop scope; the
loop scope never has any actual cleanups, it's just used to point to
the basic blocks where control should flow after a "continue" or
"break" statement. Popping a loop scope never generates code.
### Custom cleanup scopes
Custom cleanup scopes are used for a variety of purposes. The most
common though is to handle temporary byproducts, where cleanup only
needs to occur on failure. The general strategy is to push a custom
cleanup scope, schedule *shallow* cleanups into the custom scope, and
then pop the custom scope (without transing the cleanups) when
execution succeeds normally. This way the cleanups are only trans'd on
unwind, and only up until the point where execution succeeded, at
which time the complete value should be stored in an lvalue or some
other place where normal cleanup applies.
To spell it out, here is an example. Imagine an expression `box expr`.
We would basically:
1. Push a custom cleanup scope C.
2. Allocate the box.
3. Schedule a shallow free in the scope C.
4. Trans `expr` into the box.
5. Pop the scope C.
6. Return the box as an rvalue.
This way, if a failure occurs while transing `expr`, the custom
cleanup scope C is pushed and hence the box will be freed. The trans
code for `expr` itself is responsible for freeing any other byproducts
that may be in play.
*/