rust/subtyping.md

177 lines
7.1 KiB
Markdown
Raw Normal View History

2015-07-06 20:36:16 -05:00
% Subtyping and Variance
Although Rust doesn't have any notion of inheritance, it *does* include subtyping.
In Rust, subtyping derives entirely from *lifetimes*. Since lifetimes are scopes,
we can partially order them based on a *contains* (outlives) relationship. We
can even express this as a generic bound: `T: 'a` specifies that whatever scope `T`
is valid for must contain the scope `'a` ("T outlives `'a`").
We can then define subtyping on lifetimes in terms of that relationship: if `'a: 'b`
("a contains b" or "a outlives b"), then `'a` is a subtype of `'b`. This is a
large source of confusion, because it seems intuitively backwards to many:
the bigger scope is a *sub type* of the smaller scope.
This does in fact make sense. The intuitive reason for this is that if you expect an
`&'a u8`, then it's totally fine for me to hand you an `&'static u8`, in the same way
that if you expect an Animal in Java, it's totally fine for me to hand you a Cat.
Cats are just Animals *and more*, just as `'static` is just `'a` *and more*.
(Note, the subtyping relationship and typed-ness of lifetimes is a fairly arbitrary
construct that some disagree with. I just find that it simplifies this analysis.)
Higher-ranked lifetimes are also subtypes of every concrete lifetime. This is because
taking an arbitrary lifetime is strictly more general than taking a specific one.
# Variance
Variance is where things get really harsh.
Variance is a property that *type constructors* have. A type constructor in Rust
is a generic type with unbound arguments. For instance `Vec` is a type constructor
that takes a `T` and returns a `Vec<T>`. `&` and `&mut` are type constructors that
take a lifetime and a type.
A type constructor's *variance* is how the subtypes of its inputs affects the
subtypes of its outputs. There are three kinds of variance:
* F is *variant* if `T` being a subtype of `U` implies `F<T>` is a subtype of `F<U>`
* F is *invariant* otherwise (no subtyping relation can be derived)
(For those of you who are familiar with variance from other languages, what we refer
to as "just" variance is in fact *covariance*. Rust does not have contravariance.
Historically Rust did have some contravariance but it was scrapped due to poor
interactions with other features.)
Some important variances:
* `&` is variant (as is `*const` by metaphor)
* `&mut` is invariant (as is `*mut` by metaphor)
* `Fn(T) -> U` is invariant with respect to `T`, but variant with respect to `U`
* `Box`, `Vec`, and all other collections are variant
* `UnsafeCell`, `Cell`, `RefCell`, `Mutex` and all "interior mutability"
types are invariant
To understand why these variances are correct and desirable, we will consider several
examples. We have already covered why `&` should be variant when introducing subtyping:
it's desirable to be able to pass longer-lived things where shorter-lived things are
needed.
To see why `&mut` should be invariant, consider the following code:
```rust
fn main() {
let mut forever_str: &'static str = "hello";
{
let string = String::from("world");
overwrite(&mut forever_str, &mut &*string);
}
println!("{}", forever_str);
}
fn overwrite<T: Copy>(input: &mut T, new: &mut T) {
*input = *new;
}
```
The signature of `overwrite` is clearly valid: it takes mutable references to two values
of the same type, and overwrites one with the other. We have seen already that `&` is
variant, and `'static` is a subtype of *any* `'a`, so `&'static str` is a
subtype of `&'a str`. Therefore, if `&mut` was
*also* variant, then the lifetime of the `&'static str` would successfully be
"shrunk" down to the shorter lifetime of the string, and `overwrite` would be
called successfully. The string would subsequently be dropped, and `forever_str`
would point to freed memory when we print it!
Therefore `&mut` should be invariant. This is the general theme of variance vs
invariance: if variance would allow you to *store* a short-lived value in a
longer-lived slot, then you must be invariant.
`Box` and `Vec` are interesting cases because they're variant, but you can
definitely store values in them! This is fine because *you can only store values
in them through a mutable reference*! The mutable reference makes the whole type
invariant, and therefore prevents you from getting in trouble.
Being variant allows them to be variant when shared immutably (so you can pass
a `&Box<&'static str>` where a `&Box<&'a str>` is expected). It also allows you to
forever weaken the type by moving it into a weaker slot. That is, you can do:
```rust
fn get_box<'a>(&'a u8) -> Box<&'a str> {
// string literals are `&'static str`s
Box::new("hello")
}
```
which is fine because unlike the mutable borrow case, there's no one else who
"remembers" the old lifetime in the box.
The variance of the cell types similarly follows. `&` is like an `&mut` for a
cell, because you can still store values in them through an `&`. Therefore cells
must be invariant to avoid lifetime smuggling.
`Fn` is the most subtle case, because it has mixed variance. To see why
`Fn(T) -> U` should be invariant over T, consider the following function
signature:
```rust
// 'a is derived from some parent scope
fn foo(&'a str) -> usize;
```
This signature claims that it can handle any &str that lives *at least* as long
as `'a`. Now if this signature was variant with respect to `&str`, that would mean
```rust
fn foo(&'static str) -> usize;
```
could be provided in its place, as it would be a subtype. However this function
has a *stronger* requirement: it says that it can *only* handle `&'static str`s,
and nothing else. Therefore functions are not variant over their arguments.
To see why `Fn(T) -> U` should be *variant* over U, consider the following
function signature:
```rust
// 'a is derived from some parent scope
fn foo(usize) -> &'a str;
```
This signature claims that it will return something that outlives `'a`. It is
therefore completely reasonable to provide
```rust
fn foo(usize) -> &'static str;
```
in its place. Therefore functions *are* variant over their return type.
`*const` has the exact same semantics as `&`, so variance follows. `*mut` on the
other hand can dereference to an &mut whether shared or not, so it is marked
as invariant in analogy to cells.
This is all well and good for the types the standard library provides, but
how is variance determined for type that *you* define? A struct, informally
speaking, inherits the variance of its fields. If a struct `Foo`
has a generic argument `A` that is used in a field `a`, then Foo's variance
over `A` is exactly `a`'s variance. However this is complicated if `A` is used
in multiple fields.
* If all uses of A are variant, then Foo is variant over A
* Otherwise, Foo is invariant over A
```rust
struct Foo<'a, 'b, A, B, C, D, E, F, G, H> {
a: &'a A, // variant over 'a and A
b: &'b mut B, // invariant over 'b and B
c: *const C, // variant over C
d: *mut D, // invariant over D
e: Vec<E>, // variant over E
f: Cell<F>, // invariant over F
g: G // variant over G
h1: H // would also be variant over H except...
h2: Cell<H> // invariant over H, because invariance wins
}
```