rust/repr-rust.md
2015-07-06 18:36:16 -07:00

3.8 KiB

% repr(Rust)

Rust gives you the following ways to lay out composite data:

  • structs (named product types)
  • tuples (anonymous product types)
  • arrays (homogeneous product types)
  • enums (named sum types -- tagged unions)

An enum is said to be C-like if none of its variants have associated data.

For all these, individual fields are aligned to their preferred alignment. For primitives this is usually equal to their size. For instance, a u32 will be aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16 bits. Composite structures will have a preferred alignment equal to the maximum of their fields' preferred alignment, and a size equal to a multiple of their preferred alignment. This ensures that arrays of T can be correctly iterated by offsetting by their size. So for instance,

struct A {
    a: u8,
    c: u32,
    b: u16,
}

will have a size that is a multiple of 32-bits, and 32-bit alignment.

There is no indirection for these types; all data is stored contiguously as you would expect in C. However with the exception of arrays (which are densely packed and in-order), the layout of data is not by default specified in Rust. Given the two following struct definitions:

struct A {
    a: i32,
    b: u64,
}

struct B {
    x: i32,
    b: u64,
}

Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not guarantee that an instance of A has the same field ordering or padding as an instance of B (in practice there's no particular reason why they wouldn't, other than that its not currently guaranteed).

With A and B as written, this is basically nonsensical, but several other features of Rust make it desirable for the language to play with data layout in complex ways.

For instance, consider this struct:

struct Foo<T, U> {
    count: u16,
    data1: T,
    data2: U,
}

Now consider the monomorphizations of Foo<u32, u16> and Foo<u16, u32>. If Rust lays out the fields in the order specified, we expect it to pad the values in the struct to satisfy their alignment requirements. So if Rust didn't reorder fields, we would expect Rust to produce the following:

struct Foo<u16, u32> {
    count: u16,
    data1: u16,
    data2: u32,
}

struct Foo<u32, u16> {
    count: u16,
    _pad1: u16,
    data1: u32,
    data2: u16,
    _pad2: u16,
}

The latter case quite simply wastes space. An optimal use of space therefore requires different monomorphizations to have different field orderings.

Note: this is a hypothetical optimization that is not yet implemented in Rust 1.0

Enums make this consideration even more complicated. Naively, an enum such as:

enum Foo {
    A(u32),
    B(u64),
    C(u8),
}

would be laid out as:

struct FooRepr {
    data: u64, // this is *really* either a u64, u32, or u8 based on `tag`
    tag: u8, // 0 = A, 1 = B, 2 = C
}

And indeed this is approximately how it would be laid out in general (modulo the size and position of tag). However there are several cases where such a representation is ineffiecient. The classic case of this is Rust's "null pointer optimization". Given a pointer that is known to not be null (e.g. &u32), an enum can store a discriminant bit inside the pointer by using null as a special value. The net result is that size_of::<Option<&T>>() == size_of::<&T>()

There are many types in Rust that are, or contain, "not null" pointers such as Box<T>, Vec<T>, String, &T, and &mut T. Similarly, one can imagine nested enums pooling their tags into a single descriminant, as they are by definition known to have a limited range of valid values. In principle enums can use fairly elaborate algorithms to cache bits throughout nested types with special constrained representations. As such it is especially desirable that we leave enum layout unspecified today.