e3b7fedc20
I cleaned up comments prior to the 1.0 alpha release.
248 lines
11 KiB
XML
248 lines
11 KiB
XML
// Copyright 2012 The Rust Project Developers. See the COPYRIGHT
|
|
// file at the top-level directory of this distribution and at
|
|
// http://rust-lang.org/COPYRIGHT.
|
|
//
|
|
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
|
|
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
|
|
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
|
|
// option. This file may not be copied, modified, or distributed
|
|
// except according to those terms.
|
|
|
|
//! # Type inference engine
|
|
//!
|
|
//! This is loosely based on standard HM-type inference, but with an
|
|
//! extension to try and accommodate subtyping. There is nothing
|
|
//! principled about this extension; it's sound---I hope!---but it's a
|
|
//! heuristic, ultimately, and does not guarantee that it finds a valid
|
|
//! typing even if one exists (in fact, there are known scenarios where it
|
|
//! fails, some of which may eventually become problematic).
|
|
//!
|
|
//! ## Key idea
|
|
//!
|
|
//! The main change is that each type variable T is associated with a
|
|
//! lower-bound L and an upper-bound U. L and U begin as bottom and top,
|
|
//! respectively, but gradually narrow in response to new constraints
|
|
//! being introduced. When a variable is finally resolved to a concrete
|
|
//! type, it can (theoretically) select any type that is a supertype of L
|
|
//! and a subtype of U.
|
|
//!
|
|
//! There are several critical invariants which we maintain:
|
|
//!
|
|
//! - the upper-bound of a variable only becomes lower and the lower-bound
|
|
//! only becomes higher over time;
|
|
//! - the lower-bound L is always a subtype of the upper bound U;
|
|
//! - the lower-bound L and upper-bound U never refer to other type variables,
|
|
//! but only to types (though those types may contain type variables).
|
|
//!
|
|
//! > An aside: if the terms upper- and lower-bound confuse you, think of
|
|
//! > "supertype" and "subtype". The upper-bound is a "supertype"
|
|
//! > (super=upper in Latin, or something like that anyway) and the lower-bound
|
|
//! > is a "subtype" (sub=lower in Latin). I find it helps to visualize
|
|
//! > a simple class hierarchy, like Java minus interfaces and
|
|
//! > primitive types. The class Object is at the root (top) and other
|
|
//! > types lie in between. The bottom type is then the Null type.
|
|
//! > So the tree looks like:
|
|
//! >
|
|
//! > ```text
|
|
//! > Object
|
|
//! > / \
|
|
//! > String Other
|
|
//! > \ /
|
|
//! > (null)
|
|
//! > ```
|
|
//! >
|
|
//! > So the upper bound type is the "supertype" and the lower bound is the
|
|
//! > "subtype" (also, super and sub mean upper and lower in Latin, or something
|
|
//! > like that anyway).
|
|
//!
|
|
//! ## Satisfying constraints
|
|
//!
|
|
//! At a primitive level, there is only one form of constraint that the
|
|
//! inference understands: a subtype relation. So the outside world can
|
|
//! say "make type A a subtype of type B". If there are variables
|
|
//! involved, the inferencer will adjust their upper- and lower-bounds as
|
|
//! needed to ensure that this relation is satisfied. (We also allow "make
|
|
//! type A equal to type B", but this is translated into "A <: B" and "B
|
|
//! <: A")
|
|
//!
|
|
//! As stated above, we always maintain the invariant that type bounds
|
|
//! never refer to other variables. This keeps the inference relatively
|
|
//! simple, avoiding the scenario of having a kind of graph where we have
|
|
//! to pump constraints along and reach a fixed point, but it does impose
|
|
//! some heuristics in the case where the user is relating two type
|
|
//! variables A <: B.
|
|
//!
|
|
//! Combining two variables such that variable A will forever be a subtype
|
|
//! of variable B is the trickiest part of the algorithm because there is
|
|
//! often no right choice---that is, the right choice will depend on
|
|
//! future constraints which we do not yet know. The problem comes about
|
|
//! because both A and B have bounds that can be adjusted in the future.
|
|
//! Let's look at some of the cases that can come up.
|
|
//!
|
|
//! Imagine, to start, the best case, where both A and B have an upper and
|
|
//! lower bound (that is, the bounds are not top nor bot respectively). In
|
|
//! that case, if we're lucky, A.ub <: B.lb, and so we know that whatever
|
|
//! A and B should become, they will forever have the desired subtyping
|
|
//! relation. We can just leave things as they are.
|
|
//!
|
|
//! ### Option 1: Unify
|
|
//!
|
|
//! However, suppose that A.ub is *not* a subtype of B.lb. In
|
|
//! that case, we must make a decision. One option is to unify A
|
|
//! and B so that they are one variable whose bounds are:
|
|
//!
|
|
//! UB = GLB(A.ub, B.ub)
|
|
//! LB = LUB(A.lb, B.lb)
|
|
//!
|
|
//! (Note that we will have to verify that LB <: UB; if it does not, the
|
|
//! types are not intersecting and there is an error) In that case, A <: B
|
|
//! holds trivially because A==B. However, we have now lost some
|
|
//! flexibility, because perhaps the user intended for A and B to end up
|
|
//! as different types and not the same type.
|
|
//!
|
|
//! Pictorally, what this does is to take two distinct variables with
|
|
//! (hopefully not completely) distinct type ranges and produce one with
|
|
//! the intersection.
|
|
//!
|
|
//! ```text
|
|
//! B.ub B.ub
|
|
//! /\ /
|
|
//! A.ub / \ A.ub /
|
|
//! / \ / \ \ /
|
|
//! / X \ UB
|
|
//! / / \ \ / \
|
|
//! / / / \ / /
|
|
//! \ \ / / \ /
|
|
//! \ X / LB
|
|
//! \ / \ / / \
|
|
//! \ / \ / / \
|
|
//! A.lb B.lb A.lb B.lb
|
|
//! ```
|
|
//!
|
|
//!
|
|
//! ### Option 2: Relate UB/LB
|
|
//!
|
|
//! Another option is to keep A and B as distinct variables but set their
|
|
//! bounds in such a way that, whatever happens, we know that A <: B will hold.
|
|
//! This can be achieved by ensuring that A.ub <: B.lb. In practice there
|
|
//! are two ways to do that, depicted pictorially here:
|
|
//!
|
|
//! ```text
|
|
//! Before Option #1 Option #2
|
|
//!
|
|
//! B.ub B.ub B.ub
|
|
//! /\ / \ / \
|
|
//! A.ub / \ A.ub /(B')\ A.ub /(B')\
|
|
//! / \ / \ \ / / \ / /
|
|
//! / X \ __UB____/ UB /
|
|
//! / / \ \ / | | /
|
|
//! / / / \ / | | /
|
|
//! \ \ / / /(A')| | /
|
|
//! \ X / / LB ______LB/
|
|
//! \ / \ / / / \ / (A')/ \
|
|
//! \ / \ / \ / \ \ / \
|
|
//! A.lb B.lb A.lb B.lb A.lb B.lb
|
|
//! ```
|
|
//!
|
|
//! In these diagrams, UB and LB are defined as before. As you can see,
|
|
//! the new ranges `A'` and `B'` are quite different from the range that
|
|
//! would be produced by unifying the variables.
|
|
//!
|
|
//! ### What we do now
|
|
//!
|
|
//! Our current technique is to *try* (transactionally) to relate the
|
|
//! existing bounds of A and B, if there are any (i.e., if `UB(A) != top
|
|
//! && LB(B) != bot`). If that succeeds, we're done. If it fails, then
|
|
//! we merge A and B into same variable.
|
|
//!
|
|
//! This is not clearly the correct course. For example, if `UB(A) !=
|
|
//! top` but `LB(B) == bot`, we could conceivably set `LB(B)` to `UB(A)`
|
|
//! and leave the variables unmerged. This is sometimes the better
|
|
//! course, it depends on the program.
|
|
//!
|
|
//! The main case which fails today that I would like to support is:
|
|
//!
|
|
//! ```text
|
|
//! fn foo<T>(x: T, y: T) { ... }
|
|
//!
|
|
//! fn bar() {
|
|
//! let x: @mut int = @mut 3;
|
|
//! let y: @int = @3;
|
|
//! foo(x, y);
|
|
//! }
|
|
//! ```
|
|
//!
|
|
//! In principle, the inferencer ought to find that the parameter `T` to
|
|
//! `foo(x, y)` is `@const int`. Today, however, it does not; this is
|
|
//! because the type variable `T` is merged with the type variable for
|
|
//! `X`, and thus inherits its UB/LB of `@mut int`. This leaves no
|
|
//! flexibility for `T` to later adjust to accommodate `@int`.
|
|
//!
|
|
//! ### What to do when not all bounds are present
|
|
//!
|
|
//! In the prior discussion we assumed that A.ub was not top and B.lb was
|
|
//! not bot. Unfortunately this is rarely the case. Often type variables
|
|
//! have "lopsided" bounds. For example, if a variable in the program has
|
|
//! been initialized but has not been used, then its corresponding type
|
|
//! variable will have a lower bound but no upper bound. When that
|
|
//! variable is then used, we would like to know its upper bound---but we
|
|
//! don't have one! In this case we'll do different things depending on
|
|
//! how the variable is being used.
|
|
//!
|
|
//! ## Transactional support
|
|
//!
|
|
//! Whenever we adjust merge variables or adjust their bounds, we always
|
|
//! keep a record of the old value. This allows the changes to be undone.
|
|
//!
|
|
//! ## Regions
|
|
//!
|
|
//! I've only talked about type variables here, but region variables
|
|
//! follow the same principle. They have upper- and lower-bounds. A
|
|
//! region A is a subregion of a region B if A being valid implies that B
|
|
//! is valid. This basically corresponds to the block nesting structure:
|
|
//! the regions for outer block scopes are superregions of those for inner
|
|
//! block scopes.
|
|
//!
|
|
//! ## Integral and floating-point type variables
|
|
//!
|
|
//! There is a third variety of type variable that we use only for
|
|
//! inferring the types of unsuffixed integer literals. Integral type
|
|
//! variables differ from general-purpose type variables in that there's
|
|
//! no subtyping relationship among the various integral types, so instead
|
|
//! of associating each variable with an upper and lower bound, we just
|
|
//! use simple unification. Each integer variable is associated with at
|
|
//! most one integer type. Floating point types are handled similarly to
|
|
//! integral types.
|
|
//!
|
|
//! ## GLB/LUB
|
|
//!
|
|
//! Computing the greatest-lower-bound and least-upper-bound of two
|
|
//! types/regions is generally straightforward except when type variables
|
|
//! are involved. In that case, we follow a similar "try to use the bounds
|
|
//! when possible but otherwise merge the variables" strategy. In other
|
|
//! words, `GLB(A, B)` where `A` and `B` are variables will often result
|
|
//! in `A` and `B` being merged and the result being `A`.
|
|
//!
|
|
//! ## Type coercion
|
|
//!
|
|
//! We have a notion of assignability which differs somewhat from
|
|
//! subtyping; in particular it may cause region borrowing to occur. See
|
|
//! the big comment later in this file on Type Coercion for specifics.
|
|
//!
|
|
//! ### In conclusion
|
|
//!
|
|
//! I showed you three ways to relate `A` and `B`. There are also more,
|
|
//! of course, though I'm not sure if there are any more sensible options.
|
|
//! The main point is that there are various options, each of which
|
|
//! produce a distinct range of types for `A` and `B`. Depending on what
|
|
//! the correct values for A and B are, one of these options will be the
|
|
//! right choice: but of course we don't know the right values for A and B
|
|
//! yet, that's what we're trying to find! In our code, we opt to unify
|
|
//! (Option #1).
|
|
//!
|
|
//! # Implementation details
|
|
//!
|
|
//! We make use of a trait-like implementation strategy to consolidate
|
|
//! duplicated code between subtypes, GLB, and LUB computations. See the
|
|
//! section on "Type Combining" below for details.
|