11 KiB
Datatypes
Rust datatypes are, by default, immutable. The core datatypes of Rust are structural records and 'tags' (tagged unions, algebraic data types).
type point = {x: float, y: float};
tag shape {
circle(point, float);
rectangle(point, point);
}
let my_shape = circle({x: 0.0, y: 0.0}, 10.0);
Records
Rust record types are written {field1: TYPE, field2: TYPE [, ...]}
,
and record literals are written in the same way, but with expressions
instead of types. They are quite similar to C structs, and even laid
out the same way in memory (so you can read from a Rust struct in C,
and vice-versa).
The dot operator is used to access record fields (mypoint.x
).
Fields that you want to mutate must be explicitly marked as such. For example...
type stack = {content: [int], mutable head: uint};
With such a type, you can do mystack.head += 1u
. When the mutable
is omitted from the type, such an assignment would result in a type
error.
To 'update' an immutable record, you use functional record update
syntax, by ending a record literal with the keyword with
:
let oldpoint = {x: 10f, y: 20f};
let newpoint = {x: 0f with oldpoint};
assert newpoint == {x: 0f, y: 20f};
This will create a new struct, copying all the fields from oldpoint
into it, except for the ones that are explicitly set in the literal.
Rust record types are structural. This means that {x: float, y: float}
is not just a way to define a new type, but is the actual name
of the type. Record types can be used without first defining them. If
module A defines type point = {x: float, y: float}
, and module B,
without knowing anything about A, defines a function that returns an
{x: float, y: float}
, you can use that return value as a point
in
module A. (Remember that type
defines an additional name for a type,
not an actual new type.)
Record patterns
Records can be destructured on in alt
patterns. The basic syntax is
{fieldname: pattern, ...}
, but the pattern for a field can be
omitted as a shorthand for simply binding the variable with the same
name as the field.
alt mypoint {
{x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
{x, y} { /* Simply bind the fields */ }
}
When you are not interested in all the fields of a record, a record
pattern may end with , _
(as in {field1, _}
) to indicate that
you're ignoring all other fields.
Tags
Tags [FIXME terminology] are datatypes that have several different representations. For example, the type shown earlier:
tag shape {
circle(point, float);
rectangle(point, point);
}
A value of this type is either a circle¸ in which case it contains a point record and a float, or a rectangle, in which case it contains two point records. The run-time representation of such a value includes an identifier of the actual form that it holds, much like the 'tagged union' pattern in C, but with better ergonomics.
The above declaration will define a type shape
that can be used to
refer to such shapes, and two functions, circle
and rectangle
,
which can be used to construct values of the type (taking arguments of
the specified types). So circle({x: 0f, y: 0f}, 10f)
is the way to
create a new circle.
Tag variants do not have to have parameters. This, for example, is
equivalent to an enum
in C:
tag direction {
north;
east;
south;
west;
};
This will define north
, east
, south
, and west
as constants,
all of which have type direction
.
There is a special case for tags with a single variant. These are used to define new types in such a way that the new name is not just a synonym for an existing type, but its own distinct type. If you say:
tag gizmo_id = int;
That is a shorthand for this:
tag gizmo_id { gizmo_id(int); }
Tag types like this can have their content extracted with the
dereference (*
) unary operator:
let my_gizmo_id = gizmo_id(10);
let id_int: int = *my_gizmo_id;
Tag patterns
For tag types with multiple variants, destructuring is the only way to
get at their contents. All variant constructors can be used as
patterns, as in this definition of area
:
fn area(sh: shape) -> float {
alt sh {
circle(_, size) { std::math::pi * size * size }
rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
}
}
For variants without arguments, you have to write variantname.
(with
a dot at the end) to match them in a pattern. This to prevent
ambiguity between matching a variant name and binding a new variable.
fn point_from_direction(dir: direction) -> point {
alt dir {
north. { {x: 0f, y: 1f} }
east. { {x: 1f, y: 0f} }
south. { {x: 0f, y: -1f} }
west. { {x: -1f, y: 0f} }
}
}
Tuples
Tuples in Rust behave exactly like records, except that their fields
do not have names (and can thus not be accessed with dot notation).
Tuples can have any arity except for 0 or 1 (though you may see nil,
()
, as the empty tuple if you like).
let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
(a, b, c) { log a + b + (c as int); }
}
Pointers
In contrast to a lot of modern languages, record and tag types in Rust
are not represented as pointers to allocated memory. They are, like in
C and C++, represented directly. This means that if you let x = {x: 1f, y: 1f};
, you are creating a record on the stack. If you then copy
it into a data structure, the whole record is copied, not just a
pointer.
For small records like point
, this is usually still more efficient
than allocating memory and going through a pointer. But for big
records, or records with mutable fields, it can be useful to have a
single copy on the heap, and refer to that through a pointer.
Rust supports several types of pointers. The simplest is the unsafe
pointer, written *TYPE
, which is a completely unchecked pointer
type only used in unsafe code (and thus, in typical Rust code, very
rarely). The safe pointer types are @TYPE
for shared,
reference-counted boxes, and ~TYPE
, for uniquely-owned pointers.
All pointer types can be dereferenced with the *
unary operator.
Shared boxes
Shared boxes are pointers to heap-allocated, reference counted memory. A cycle collector ensures that circular references do not result in memory leaks.
Creating a shared box is done by simply applying the binary @
operator to an expression. The result of the expression will be boxed,
resulting in a box of the right type. For example:
let x = @10; // New box, refcount of 1
let y = x; // Copy the pointer, increase refcount
// When x and y go out of scope, refcount goes to 0, box is freed
NOTE: We may in the future switch to garbage collection, rather than reference counting, for shared boxes.
Shared boxes never cross task boundaries.
Unique boxes
In contrast to shared boxes, unique boxes are not reference counted. Instead, it is statically guaranteed that only a single owner of the box exists at any time.
let x = ~10;
let y <- x;
This is where the 'move' (<-
) operator comes in. It is similar to
=
, but it de-initializes its source. Thus, the unique box can move
from x
to y
, without violating the constraint that it only has a
single owner.
NOTE: If you do y = x
instead, the box will be copied. We should
emit warning for this, or disallow it entirely, but do not currently
do so.
Unique boxes, when they do not contain any shared boxes, can be sent to other tasks. The sending task will give up ownership of the box, and won't be able to access it afterwards. The receiving task will become the sole owner of the box.
Mutability
All pointer types have a mutable variant, written @mutable TYPE
or
~mutable TYPE
. Given such a pointer, you can write to its contents
by combining the dereference operator with a mutating action.
fn increase_contents(pt: @mutable int) {
*pt += 1;
}
Vectors
Rust vectors are always heap-allocated and unique. A value of type
[TYPE]
is represented by a pointer to a section of heap memory
containing any number of TYPE
values.
NOTE: This uniqueness is turning out to be quite awkward in practice, and might change.
Vector literals are enclosed in square brackets. Dereferencing is done with square brackets (and zero-based):
let myvec = [true, false, true, false];
if myvec[1] { std::io::println("boom"); }
By default, vectors are immutable—you can not replace their elements.
The type written as [mutable TYPE]
is a vector with mutable
elements. Mutable vector literals are written [mutable]
(empty) or
[mutable 1, 2, 3]
(with elements).
Growing a vector in Rust is not as inefficient as it looks (the +
operator means concatenation when applied to vector types):
let myvec = [], i = 0;
while i < 100 {
myvec += [i];
i += 1;
}
Because a vector is unique, replacing it with a longer one (which is
what += [i]
does) is indistinguishable from appending to it
in-place. Vector representations are optimized to grow
logarithmically, so the above code generates about the same amount of
copying and reallocation as push
implementations in most other
languages.
Strings
The str
type in Rust is represented exactly the same way as a vector
of bytes ([u8]
), except that it is guaranteed to have a trailing
null byte (for interoperability with C APIs).
This sequence of bytes is interpreted as an UTF-8 encoded sequence of characters. This has the advantage that UTF-8 encoded I/O (which should really be the goal for modern systems) is very fast, and that strings have, for most intents and purposes, a nicely compact representation. It has the disadvantage that you only get constant-time access by byte, not by character.
A lot of algorithms don't need constant-time indexed access (they
iterate over all characters, which std::str::chars
helps with), and
for those that do, many don't need actual characters, and can operate
on bytes. For algorithms that do really need to index by character,
there's the option to convert your string to a character vector (using
std::str::to_chars
).
Like vectors, strings are always unique. You can wrap them in a shared box to share them. Unlike vectors, there is no mutable variant of strings. They are always immutable.
Resources
Resources are data types that have a destructor associated with them.
resource file_desc(fd: int) {
close_file_desc(fd);
}
This defines a type file_desc
and a constructor of the same name,
which takes an integer. Values of such a type can not be copied, and
when they are destroyed (by going out of scope, or, when boxed, when
their box is cleaned up), their body runs. In the example above, this
would cause the given file descriptor to be closed.
NOTE: We're considering alternative approaches for data types with destructors. Resources might go away in the future.