rust/src/librustc_trans/trans/doc.rs

// Copyright 2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

//! # Documentation for the trans module
//!
//! This module contains high-level summaries of how the various modules
//! in trans work. It is a work in progress. For detailed comments,
//! naturally, you can refer to the individual modules themselves.
//!
//! ## The Expr module
//!
//! The expr module handles translation of expressions. The most general
//! translation routine is `trans()`, which will translate an expression
//! into a datum. `trans_into()` is also available, which will translate
//! an expression and write the result directly into memory, sometimes
//! avoiding the need for a temporary stack slot. Finally,
//! `trans_to_lvalue()` is available if you'd like to ensure that the
//! result has cleanup scheduled.
//!
//! Internally, each of these functions dispatches to various other
//! expression functions depending on the kind of expression. We divide
//! up expressions into:
//!
//! - **Datum expressions:** Those that most naturally yield values.
//!   Examples would be `22`, `box x`, or `a + b` (when not overloaded).
//! - **DPS expressions:** Those that most naturally write into a location
//!   in memory. Examples would be `foo()` or `Point { x: 3, y: 4 }`.
//! - **Statement expressions:** That that do not generate a meaningful
//!   result. Examples would be `while { ... }` or `return 44`.
//!
//! ## The Datum module
//!
//! A `Datum` encapsulates the result of evaluating a Rust expression.  It
//! contains a `ValueRef` indicating the result, a `Ty` describing
//! the Rust type, but also a *kind*. The kind indicates whether the datum
//! has cleanup scheduled (lvalue) or not (rvalue) and -- in the case of
//! rvalues -- whether or not the value is "by ref" or "by value".
//!
//! The datum API is designed to try and help you avoid memory errors like
//! forgetting to arrange cleanup or duplicating a value. The type of the
//! datum incorporates the kind, and thus reflects whether it has cleanup
//! scheduled:
//!
//! - `Datum<Lvalue>` -- by ref, cleanup scheduled
//! - `Datum<Rvalue>` -- by value or by ref, no cleanup scheduled
//! - `Datum<Expr>` -- either `Datum<Lvalue>` or `Datum<Rvalue>`
//!
//! Rvalue and expr datums are noncopyable, and most of the methods on
//! datums consume the datum itself (with some notable exceptions). This
//! reflects the fact that datums may represent affine values which ought
//! to be consumed exactly once, and if you were to try to (for example)
//! store an affine value multiple times, you would be duplicating it,
//! which would certainly be a bug.
//!
//! Some of the datum methods, however, are designed to work only on
//! copyable values such as ints or pointers. Those methods may borrow the
//! datum (`&self`) rather than consume it, but they always include
//! assertions on the type of the value represented to check that this
//! makes sense. An example is `shallow_copy()`, which duplicates
//! a datum value.
//!
//! Translating an expression always yields a `Datum<Expr>` result, but
//! the methods `to_[lr]value_datum()` can be used to coerce a
//! `Datum<Expr>` into a `Datum<Lvalue>` or `Datum<Rvalue>` as
//! needed. Coercing to an lvalue is fairly common, and generally occurs
//! whenever it is necessary to inspect a value and pull out its
//! subcomponents (for example, a match, or indexing expression). Coercing
//! to an rvalue is more unusual; it occurs when moving values from place
//! to place, such as in an assignment expression or parameter passing.
//!
//! ### Lvalues in detail
//!
//! An lvalue datum is one for which cleanup has been scheduled. Lvalue
//! datums are always located in memory, and thus the `ValueRef` for an
//! LLVM value is always a pointer to the actual Rust value. This means
//! that if the Datum has a Rust type of `int`, then the LLVM type of the
//! `ValueRef` will be `int*` (pointer to int).
//!
//! Because lvalues already have cleanups scheduled, the memory must be
//! zeroed to prevent the cleanup from taking place (presuming that the
//! Rust type needs drop in the first place, otherwise it doesn't
//! matter). The Datum code automatically performs this zeroing when the
//! value is stored to a new location, for example.
//!
//! Lvalues usually result from evaluating lvalue expressions. For
//! example, evaluating a local variable `x` yields an lvalue, as does a
//! reference to a field like `x.f` or an index `x[i]`.
//!
//! Lvalue datums can also arise by *converting* an rvalue into an lvalue.
//! This is done with the `to_lvalue_datum` method defined on
//! `Datum<Expr>`. Basically this method just schedules cleanup if the
//! datum is an rvalue, possibly storing the value into a stack slot first
//! if needed. Converting rvalues into lvalues occurs in constructs like
//! `&foo()` or `match foo() { ref x => ... }`, where the user is
//! implicitly requesting a temporary.
//!
//! Somewhat surprisingly, not all lvalue expressions yield lvalue datums
//! when trans'd. Ultimately the reason for this is to micro-optimize
//! the resulting LLVM. For example, consider the following code:
//!
//!     fn foo() -> Box<int> { ... }
//!     let x = *foo();
//!
//! The expression `*foo()` is an lvalue, but if you invoke `expr::trans`,
//! it will return an rvalue datum. See `deref_once` in expr.rs for
//! more details.
//!
//! ### Rvalues in detail
//!
//! Rvalues datums are values with no cleanup scheduled. One must be
//! careful with rvalue datums to ensure that cleanup is properly
//! arranged, usually by converting to an lvalue datum or by invoking the
//! `add_clean` method.
//!
//! ### Scratch datums
//!
//! Sometimes you need some temporary scratch space.  The functions
//! `[lr]value_scratch_datum()` can be used to get temporary stack
//! space. As their name suggests, they yield lvalues and rvalues
//! respectively. That is, the slot from `lvalue_scratch_datum` will have
//! cleanup arranged, and the slot from `rvalue_scratch_datum` does not.
//!
//! ## The Cleanup module
//!
//! The cleanup module tracks what values need to be cleaned up as scopes
//! are exited, either via panic or just normal control flow. The basic
//! idea is that the function context maintains a stack of cleanup scopes
//! that are pushed/popped as we traverse the AST tree. There is typically
//! at least one cleanup scope per AST node; some AST nodes may introduce
//! additional temporary scopes.
//!
//! Cleanup items can be scheduled into any of the scopes on the stack.
//! Typically, when a scope is popped, we will also generate the code for
//! each of its cleanups at that time. This corresponds to a normal exit
//! from a block (for example, an expression completing evaluation
//! successfully without panic). However, it is also possible to pop a
//! block *without* executing its cleanups; this is typically used to
//! guard intermediate values that must be cleaned up on panic, but not
//! if everything goes right. See the section on custom scopes below for
//! more details.
//!
//! Cleanup scopes come in three kinds:
//! - **AST scopes:** each AST node in a function body has a corresponding
//!   AST scope. We push the AST scope when we start generate code for an AST
//!   node and pop it once the AST node has been fully generated.
//! - **Loop scopes:** loops have an additional cleanup scope. Cleanups are
//!   never scheduled into loop scopes; instead, they are used to record the
//!   basic blocks that we should branch to when a `continue` or `break` statement
//!   is encountered.
//! - **Custom scopes:** custom scopes are typically used to ensure cleanup
//!   of intermediate values.
//!
//! ### When to schedule cleanup
//!
//! Although the cleanup system is intended to *feel* fairly declarative,
//! it's still important to time calls to `schedule_clean()` correctly.
//! Basically, you should not schedule cleanup for memory until it has
//! been initialized, because if an unwind should occur before the memory
//! is fully initialized, then the cleanup will run and try to free or
//! drop uninitialized memory. If the initialization itself produces
//! byproducts that need to be freed, then you should use temporary custom
//! scopes to ensure that those byproducts will get freed on unwind.  For
//! example, an expression like `box foo()` will first allocate a box in the
//! heap and then call `foo()` -- if `foo()` should panic, this box needs
//! to be *shallowly* freed.
//!
//! ### Long-distance jumps
//!
//! In addition to popping a scope, which corresponds to normal control
//! flow exiting the scope, we may also *jump out* of a scope into some
//! earlier scope on the stack. This can occur in response to a `return`,
//! `break`, or `continue` statement, but also in response to panic. In
//! any of these cases, we will generate a series of cleanup blocks for
//! each of the scopes that is exited. So, if the stack contains scopes A
//! ... Z, and we break out of a loop whose corresponding cleanup scope is
//! X, we would generate cleanup blocks for the cleanups in X, Y, and Z.
//! After cleanup is done we would branch to the exit point for scope X.
//! But if panic should occur, we would generate cleanups for all the
//! scopes from A to Z and then resume the unwind process afterwards.
//!
//! To avoid generating tons of code, we cache the cleanup blocks that we
//! create for breaks, returns, unwinds, and other jumps. Whenever a new
//! cleanup is scheduled, though, we must clear these cached blocks. A
//! possible improvement would be to keep the cached blocks but simply
//! generate a new block which performs the additional cleanup and then
//! branches to the existing cached blocks.
//!
//! ### AST and loop cleanup scopes
//!
//! AST cleanup scopes are pushed when we begin and end processing an AST
//! node. They are used to house cleanups related to rvalue temporary that
//! get referenced (e.g., due to an expression like `&Foo()`). Whenever an
//! AST scope is popped, we always trans all the cleanups, adding the cleanup
//! code after the postdominator of the AST node.
//!
//! AST nodes that represent breakable loops also push a loop scope; the
//! loop scope never has any actual cleanups, it's just used to point to
//! the basic blocks where control should flow after a "continue" or
//! "break" statement. Popping a loop scope never generates code.
//!
//! ### Custom cleanup scopes
//!
//! Custom cleanup scopes are used for a variety of purposes. The most
//! common though is to handle temporary byproducts, where cleanup only
//! needs to occur on panic. The general strategy is to push a custom
//! cleanup scope, schedule *shallow* cleanups into the custom scope, and
//! then pop the custom scope (without transing the cleanups) when
//! execution succeeds normally. This way the cleanups are only trans'd on
//! unwind, and only up until the point where execution succeeded, at
//! which time the complete value should be stored in an lvalue or some
//! other place where normal cleanup applies.
//!
//! To spell it out, here is an example. Imagine an expression `box expr`.
//! We would basically:
//!
//! 1. Push a custom cleanup scope C.
//! 2. Allocate the box.
//! 3. Schedule a shallow free in the scope C.
//! 4. Trans `expr` into the box.
//! 5. Pop the scope C.
//! 6. Return the box as an rvalue.
//!
//! This way, if a panic occurs while transing `expr`, the custom
//! cleanup scope C is pushed and hence the box will be freed. The trans
//! code for `expr` itself is responsible for freeing any other byproducts
//! that may be in play.