# **This is a work in progress** % The Rust Grammar # Introduction This document is the primary reference for the Rust programming language grammar. It provides only one kind of material: - Chapters that formally define the language grammar and, for each construct. This document does not serve as an introduction to the language. Background familiarity with the language is assumed. A separate [guide] is available to help acquire such background familiarity. This document also does not serve as a reference to the [standard] library included in the language distribution. Those libraries are documented separately by extracting documentation attributes from their source code. Many of the features that one might expect to be language features are library features in Rust, so what you're looking for may be there, not here. [guide]: guide.html [standard]: std/index.html # Notation Rust's grammar is defined over Unicode codepoints, each conventionally denoted `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is confined to the ASCII range of Unicode, and is described in this document by a dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF supported by common automated LL(k) parsing tools such as `llgen`, rather than the dialect given in ISO 14977. The dialect can be defined self-referentially as follows: ```antlr grammar : rule + ; rule : nonterminal ':' productionrule ';' ; productionrule : production [ '|' production ] * ; production : term * ; term : element repeats ; element : LITERAL | IDENTIFIER | '[' productionrule ']' ; repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ; ``` Where: - Whitespace in the grammar is ignored. - Square brackets are used to group rules. - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding Unicode codepoint `U+00QQ`. - `IDENTIFIER` is a nonempty string of ASCII letters and underscores. - The `repeat` forms apply to the adjacent `element`, and are as follows: - `?` means zero or one repetition - `*` means zero or more repetitions - `+` means one or more repetitions - NUMBER trailing a repeat symbol gives a maximum repetition count - NUMBER on its own gives an exact repetition count This EBNF dialect should hopefully be familiar to many readers. ## Unicode productions A few productions in Rust's grammar permit Unicode codepoints outside the ASCII range. We define these productions in terms of character properties specified in the Unicode standard, rather than in terms of ASCII-range codepoints. The section [Special Unicode Productions](#special-unicode-productions) lists these productions. ## String table productions Some rules in the grammar — notably [unary operators](#unary-operator-expressions), [binary operators](#binary-operator-expressions), and [keywords](#keywords) — are given in a simplified form: as a listing of a table of unquoted, printable whitespace-separated strings. These cases form a subset of the rules regarding the [token](#tokens) rule, and are assumed to be the result of a lexical-analysis phase feeding the parser, driven by a DFA, operating over the disjunction of all such string table entries. When such a string enclosed in double-quotes (`"`) occurs inside the grammar, it is an implicit reference to a single member of such a string table production. See [tokens](#tokens) for more information. # Lexical structure ## Input format Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8. Most Rust grammar rules are defined in terms of printable ASCII-range codepoints, but a small number are defined in terms of Unicode properties or explicit codepoint lists. [^inputformat] [^inputformat]: Substitute definitions for the special Unicode productions are provided to the grammar verifier, restricted to ASCII range, when verifying the grammar in this document. ## Special Unicode Productions The following productions in the Rust grammar are defined in terms of Unicode properties: `ident`, `non_null`, `non_star`, `non_eol`, `non_slash_or_star`, `non_single_quote` and `non_double_quote`. ### Identifiers The `ident` production is any nonempty Unicode string of the following form: - The first character has property `XID_start` - The remaining characters have property `XID_continue` that does _not_ occur in the set of [keywords](#keywords). > **Note**: `XID_start` and `XID_continue` as character properties cover the > character ranges used to form the more familiar C and Java language-family > identifiers. ### Delimiter-restricted productions Some productions are defined by exclusion of particular Unicode characters: - `non_null` is any single Unicode character aside from `U+0000` (null) - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`) - `non_star` is `non_null` restricted to exclude `U+002A` (`*`) - `non_slash_or_star` is `non_null` restricted to exclude `U+002F` (`/`) and `U+002A` (`*`) - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`) - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`) ## Comments ```antlr comment : block_comment | line_comment ; block_comment : "/*" block_comment_body * "*/" ; block_comment_body : [block_comment | character] * ; line_comment : "//" non_eol * ; ``` **FIXME:** add doc grammar? ## Whitespace ```antlr whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ; whitespace : [ whitespace_char | comment ] + ; ``` ## Tokens ```antlr simple_token : keyword | unop | binop ; token : simple_token | ident | literal | symbol | whitespace token ; ``` ### Keywords

| | | | | | |----------|----------|----------|----------|--------| | abstract | alignof | as | be | box | | break | const | continue | crate | do | | else | enum | extern | false | final | | fn | for | if | impl | in | | let | loop | match | mod | move | | mut | offsetof | once | override | priv | | proc | pub | pure | ref | return | | sizeof | static | self | struct | super | | true | trait | type | typeof | unsafe | | unsized | use | virtual | where | while | | yield | | | | | Each of these keywords has special meaning in its grammar, and all of them are excluded from the `ident` rule. ### Literals ```antlr lit_suffix : ident; literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?; ``` #### Character and string literals ```antlr char_lit : '\x27' char_body '\x27' ; string_lit : '"' string_body * '"' | 'r' raw_string ; char_body : non_single_quote | '\x5c' [ '\x27' | common_escape | unicode_escape ] ; string_body : non_double_quote | '\x5c' [ '\x22' | common_escape | unicode_escape ] ; raw_string : '"' raw_string_body '"' | '#' raw_string '#' ; common_escape : '\x5c' | 'n' | 'r' | 't' | '0' | 'x' hex_digit 2 unicode_escape : 'u' hex_digit 4 | 'U' hex_digit 8 ; hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | dec_digit ; oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; dec_digit : '0' | nonzero_dec ; nonzero_dec: '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; ``` #### Byte and byte string literals ```antlr byte_lit : "b\x27" byte_body '\x27' ; byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ; byte_body : ascii_non_single_quote | '\x5c' [ '\x27' | common_escape ] ; byte_string_body : ascii_non_double_quote | '\x5c' [ '\x22' | common_escape ] ; raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ; ``` #### Number literals ```antlr num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ? | '0' [ [ dec_digit | '_' ] * float_suffix ? | 'b' [ '1' | '0' | '_' ] + | 'o' [ oct_digit | '_' ] + | 'x' [ hex_digit | '_' ] + ] ; float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ; exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ; dec_lit : [ dec_digit | '_' ] + ; ``` #### Boolean literals **FIXME:** write grammar The two values of the boolean type are written `true` and `false`. ### Symbols ```antlr symbol : "::" "->" | '#' | '[' | ']' | '(' | ')' | '{' | '}' | ',' | ';' ; ``` Symbols are a general class of printable [token](#tokens) that play structural roles in a variety of grammar productions. They are catalogued here for completeness as the set of remaining miscellaneous printable tokens that do not otherwise appear as [unary operators](#unary-operator-expressions), [binary operators](#binary-operator-expressions), or [keywords](#keywords). ## Paths ```antlr expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ; expr_path_tail : '<' type_expr [ ',' type_expr ] + '>' | expr_path ; type_path : ident [ type_path_tail ] + ; type_path_tail : '<' type_expr [ ',' type_expr ] + '>' | "::" type_path ; ``` # Syntax extensions ## Macros ```antlr expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ; macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ; matcher : '(' matcher * ')' | '[' matcher * ']' | '{' matcher * '}' | '$' ident ':' ident | '$' '(' matcher * ')' sep_token? [ '*' | '+' ] | non_special_token ; transcriber : '(' transcriber * ')' | '[' transcriber * ']' | '{' transcriber * '}' | '$' ident | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ] | non_special_token ; ``` # Crates and source files **FIXME:** grammar? What production covers #![crate_id = "foo"] ? # Items and attributes **FIXME:** grammar? ## Items ```antlr item : mod_item | fn_item | type_item | struct_item | enum_item | static_item | trait_item | impl_item | extern_block ; ``` ### Type Parameters **FIXME:** grammar? ### Modules ```antlr mod_item : "mod" ident ( ';' | '{' mod '}' ); mod : [ view_item | item ] * ; ``` #### View items ```antlr view_item : extern_crate_decl | use_decl ; ``` ##### Extern crate declarations ```antlr extern_crate_decl : "extern" "crate" crate_name crate_name: ident | ( string_lit as ident ) ``` ##### Use declarations ```antlr use_decl : "pub" ? "use" [ path "as" ident | path_glob ] ; path_glob : ident [ "::" [ path_glob | '*' ] ] ? | '{' path_item [ ',' path_item ] * '}' ; path_item : ident | "mod" ; ``` ### Functions **FIXME:** grammar? #### Generic functions **FIXME:** grammar? #### Unsafety **FIXME:** grammar? ##### Unsafe functions **FIXME:** grammar? ##### Unsafe blocks **FIXME:** grammar? #### Diverging functions **FIXME:** grammar? ### Type definitions **FIXME:** grammar? ### Structures **FIXME:** grammar? ### Constant items ```antlr const_item : "const" ident ':' type '=' expr ';' ; ``` ### Static items ```antlr static_item : "static" ident ':' type '=' expr ';' ; ``` #### Mutable statics **FIXME:** grammar? ### Traits **FIXME:** grammar? ### Implementations **FIXME:** grammar? ### External blocks ```antlr extern_block_item : "extern" '{' extern_block '}' ; extern_block : [ foreign_fn ] * ; ``` ## Visibility and Privacy **FIXME:** grammar? ### Re-exporting and Visibility **FIXME:** grammar? ## Attributes ```antlr attribute : "#!" ? '[' meta_item ']' ; meta_item : ident [ '=' literal | '(' meta_seq ')' ] ? ; meta_seq : meta_item [ ',' meta_seq ] ? ; ``` # Statements and expressions ## Statements **FIXME:** grammar? ### Declaration statements **FIXME:** grammar? A _declaration statement_ is one that introduces one or more *names* into the enclosing statement block. The declared names may denote new slots or new items. #### Item declarations **FIXME:** grammar? An _item declaration statement_ has a syntactic form identical to an [item](#items) declaration within a module. Declaring an item — a function, enumeration, structure, type, static, trait, implementation or module — locally within a statement block is simply a way of restricting its scope to a narrow region containing all of its uses; it is otherwise identical in meaning to declaring the item outside the statement block. #### Slot declarations ```antlr let_decl : "let" pat [':' type ] ? [ init ] ? ';' ; init : [ '=' ] expr ; ``` ### Expression statements **FIXME:** grammar? ## Expressions **FIXME:** grammar? #### Lvalues, rvalues and temporaries **FIXME:** grammar? #### Moved and copied types **FIXME:** Do we want to capture this in the grammar as different productions? ### Literal expressions **FIXME:** grammar? ### Path expressions **FIXME:** grammar? ### Tuple expressions **FIXME:** grammar? ### Unit expressions **FIXME:** grammar? ### Structure expressions ```antlr struct_expr : expr_path '{' ident ':' expr [ ',' ident ':' expr ] * [ ".." expr ] '}' | expr_path '(' expr [ ',' expr ] * ')' | expr_path ; ``` ### Block expressions ```antlr block_expr : '{' [ view_item ] * [ stmt ';' | item ] * [ expr ] '}' ; ``` ### Method-call expressions ```antlr method_call_expr : expr '.' ident paren_expr_list ; ``` ### Field expressions ```antlr field_expr : expr '.' ident ; ``` ### Array expressions ```antlr array_expr : '[' "mut" ? vec_elems? ']' ; array_elems : [expr [',' expr]*] | [expr ',' ".." expr] ; ``` ### Index expressions ```antlr idx_expr : expr '[' expr ']' ; ``` ### Unary operator expressions **FIXME:** grammar? ### Binary operator expressions ```antlr binop_expr : expr binop expr ; ``` #### Arithmetic operators **FIXME:** grammar? #### Bitwise operators **FIXME:** grammar? #### Lazy boolean operators **FIXME:** grammar? #### Comparison operators **FIXME:** grammar? #### Type cast expressions **FIXME:** grammar? #### Assignment expressions **FIXME:** grammar? #### Compound assignment expressions **FIXME:** grammar? #### Operator precedence The precedence of Rust binary operators is ordered as follows, going from strong to weak: ``` * / % as + - << >> & ^ | < > <= >= == != && || = ``` Operators at the same precedence level are evaluated left-to-right. [Unary operators](#unary-operator-expressions) have the same precedence level and it is stronger than any of the binary operators'. ### Grouped expressions ```antlr paren_expr : '(' expr ')' ; ``` ### Call expressions ```antlr expr_list : [ expr [ ',' expr ]* ] ? ; paren_expr_list : '(' expr_list ')' ; call_expr : expr paren_expr_list ; ``` ### Lambda expressions ```antlr ident_list : [ ident [ ',' ident ]* ] ? ; lambda_expr : '|' ident_list '|' expr ; ``` ### While loops ```antlr while_expr : "while" no_struct_literal_expr '{' block '}' ; ``` ### Infinite loops ```antlr loop_expr : [ lifetime ':' ] "loop" '{' block '}'; ``` ### Break expressions ```antlr break_expr : "break" [ lifetime ]; ``` ### Continue expressions ```antlr continue_expr : "continue" [ lifetime ]; ``` ### For expressions ```antlr for_expr : "for" pat "in" no_struct_literal_expr '{' block '}' ; ``` ### If expressions ```antlr if_expr : "if" no_struct_literal_expr '{' block '}' else_tail ? ; else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ; ``` ### Match expressions ```antlr match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ; match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ; match_pat : pat [ '|' pat ] * [ "if" expr ] ? ; ``` ### If let expressions ```antlr if_let_expr : "if" "let" pat '=' expr '{' block '}' else_tail ? ; else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ; ``` ### While let loops ```antlr while_let_expr : "while" "let" pat '=' expr '{' block '}' ; ``` ### Return expressions ```antlr return_expr : "return" expr ? ; ``` # Type system ## Types Every slot, item and value in a Rust program has a type. The _type_ of a *value* defines the interpretation of the memory holding it. Built-in types and type-constructors are tightly integrated into the language, in nontrivial ways that are not possible to emulate in user-defined types. User-defined types have limited capabilities. ### Primitive types The primitive types are the following: * The "unit" type `()`, having the single "unit" value `()` (occasionally called "nil"). [^unittype] * The boolean type `bool` with values `true` and `false`. * The machine types. * The machine-dependent integer and floating-point types. [^unittype]: The "unit" value `()` is *not* a sentinel "null pointer" value for reference slots; the "unit" type is the implicit return type from functions otherwise lacking a return type, and can be used in other contexts (such as message-sending or type-parametric code) as a zero-size type.] #### Machine types The machine types are the following: * The unsigned word types `u8`, `u16`, `u32` and `u64`, with values drawn from the integer intervals [0, 2^8 - 1], [0, 2^16 - 1], [0, 2^32 - 1] and [0, 2^64 - 1] respectively. * The signed two's complement word types `i8`, `i16`, `i32` and `i64`, with values drawn from the integer intervals [-(2^(7)), 2^7 - 1], [-(2^(15)), 2^15 - 1], [-(2^(31)), 2^31 - 1], [-(2^(63)), 2^63 - 1] respectively. * The IEEE 754-2008 `binary32` and `binary64` floating-point types: `f32` and `f64`, respectively. #### Machine-dependent integer types The `uint` type is an unsigned integer type with the same number of bits as the platform's pointer type. It can represent every memory address in the process. The `int` type is a signed integer type with the same number of bits as the platform's pointer type. The theoretical upper bound on object and array size is the maximum `int` value. This ensures that `int` can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end. ### Textual types The types `char` and `str` hold textual data. A value of type `char` is a [Unicode scalar value]( http://www.unicode.org/glossary/#unicode_scalar_value) (ie. a code point that is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` array is effectively an UCS-4 / UTF-32 string. A value of type `str` is a Unicode string, represented as an array of 8-bit unsigned bytes holding a sequence of UTF-8 codepoints. Since `str` is of unknown size, it is not a _first class_ type, but can only be instantiated through a pointer type, such as `&str` or `String`. ### Tuple types A tuple *type* is a heterogeneous product of other types, called the *elements* of the tuple. It has no nominal name and is instead structurally typed. Tuple types and values are denoted by listing the types or values of their elements, respectively, in a parenthesized, comma-separated list. Because tuple elements don't have a name, they can only be accessed by pattern-matching. The members of a tuple are laid out in memory contiguously, in order specified by the tuple type. An example of a tuple type and its use: ``` type Pair<'a> = (int, &'a str); let p: Pair<'static> = (10, "hello"); let (a, b) = p; assert!(b != "world"); ``` ### Array, and Slice types Rust has two different types for a list of items: * `[T ..N]`, an 'array' * `&[T]`, a 'slice'. An array has a fixed size, and can be allocated on either the stack or the heap. A slice is a 'view' into an array. It doesn't own the data it points to, it borrows it. An example of each kind: ```{rust} let vec: Vec = vec![1, 2, 3]; let arr: [int, ..3] = [1, 2, 3]; let s: &[int] = vec.as_slice(); ``` As you can see, the `vec!` macro allows you to create a `Vec` easily. The `vec!` macro is also part of the standard library, rather than the language. All in-bounds elements of arrays, and slices are always initialized, and access to an array or slice is always bounds-checked. ### Structure types A `struct` *type* is a heterogeneous product of other types, called the *fields* of the type.[^structtype] [^structtype]: `struct` types are analogous `struct` types in C, the *record* types of the ML family, or the *structure* types of the Lisp family. New instances of a `struct` can be constructed with a [struct expression](#structure-expressions). The memory layout of a `struct` is undefined by default to allow for compiler optimizations like field reordering, but it can be fixed with the `#[repr(...)]` attribute. In either case, fields may be given in any order in a corresponding struct *expression*; the resulting `struct` value will always have the same memory layout. The fields of a `struct` may be qualified by [visibility modifiers](#re-exporting-and-visibility), to allow access to data in a structure outside a module. A _tuple struct_ type is just like a structure type, except that the fields are anonymous. A _unit-like struct_ type is like a structure type, except that it has no fields. The one value constructed by the associated [structure expression](#structure-expressions) is the only value that inhabits such a type. ### Enumerated types An *enumerated type* is a nominal, heterogeneous disjoint union type, denoted by the name of an [`enum` item](#enumerations). [^enumtype] [^enumtype]: The `enum` type is analogous to a `data` constructor declaration in ML, or a *pick ADT* in Limbo. An [`enum` item](#enumerations) declares both the type and a number of *variant constructors*, each of which is independently named and takes an optional tuple of arguments. New instances of an `enum` can be constructed by calling one of the variant constructors, in a [call expression](#call-expressions). Any `enum` value consumes as much memory as the largest variant constructor for its corresponding `enum` type. Enum types cannot be denoted *structurally* as types, but must be denoted by named reference to an [`enum` item](#enumerations). ### Recursive types Nominal types — [enumerations](#enumerated-types) and [structures](#structure-types) — may be recursive. That is, each `enum` constructor or `struct` field may refer, directly or indirectly, to the enclosing `enum` or `struct` type itself. Such recursion has restrictions: * Recursive types must include a nominal type in the recursion (not mere [type definitions](#type-definitions), or other structural types such as [arrays](#array,-and-slice-types) or [tuples](#tuple-types)). * A recursive `enum` item must have at least one non-recursive constructor (in order to give the recursion a basis case). * The size of a recursive type must be finite; in other words the recursive fields of the type must be [pointer types](#pointer-types). * Recursive type definitions can cross module boundaries, but not module *visibility* boundaries, or crate boundaries (in order to simplify the module system and type checker). An example of a *recursive* type and its use: ``` enum List { Nil, Cons(T, Box>) } let a: List = List::Cons(7, box List::Cons(13, box List::Nil)); ``` ### Pointer types All pointers in Rust are explicit first-class values. They can be copied, stored into data structures, and returned from functions. There are two varieties of pointer in Rust: * References (`&`) : These point to memory _owned by some other value_. A reference type is written `&type` for some lifetime-variable `f`, or just `&'a type` when you need an explicit lifetime. Copying a reference is a "shallow" operation: it involves only copying the pointer itself. Releasing a reference typically has no effect on the value it points to, with the exception of temporary values, which are released when the last reference to them is released. * Raw pointers (`*`) : Raw pointers are pointers without safety or liveness guarantees. Raw pointers are written as `*const T` or `*mut T`, for example `*const int` means a raw pointer to an integer. Copying or dropping a raw pointer has no effect on the lifecycle of any other value. Dereferencing a raw pointer or converting it to any other pointer type is an [`unsafe` operation](#unsafe-functions). Raw pointers are generally discouraged in Rust code; they exist to support interoperability with foreign code, and writing performance-critical or low-level functions. The standard library contains additional 'smart pointer' types beyond references and raw pointers. ### Function types The function type constructor `fn` forms new function types. A function type consists of a possibly-empty set of function-type modifiers (such as `unsafe` or `extern`), a sequence of input types and an output type. An example of a `fn` type: ``` fn add(x: int, y: int) -> int { return x + y; } let mut x = add(5,7); type Binop<'a> = |int,int|: 'a -> int; let bo: Binop = add; x = bo(5,7); ``` ### Closure types ```{.ebnf .notation} closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|' [ ':' bound-list ] [ '->' type ] procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')' [ ':' bound-list ] [ '->' type ] lifetime-list := lifetime | lifetime ',' lifetime-list arg-list := ident ':' type | ident ':' type ',' arg-list bound-list := bound | bound '+' bound-list bound := path | lifetime ``` The type of a closure mapping an input of type `A` to an output of type `B` is `|A| -> B`. A closure with no arguments or return values has type `||`. Similarly, a procedure mapping `A` to `B` is `proc(A) -> B` and a no-argument and no-return value closure has type `proc()`. An example of creating and calling a closure: ```rust let captured_var = 10i; let closure_no_args = || println!("captured_var={}", captured_var); let closure_args = |arg: int| -> int { println!("captured_var={}, arg={}", captured_var, arg); arg // Note lack of semicolon after 'arg' }; fn call_closure(c1: ||, c2: |int| -> int) { c1(); c2(2); } call_closure(closure_no_args, closure_args); ``` Unlike closures, procedures may only be invoked once, but own their environment, and are allowed to move out of their environment. Procedures are allocated on the heap (unlike closures). An example of creating and calling a procedure: ```rust let string = "Hello".to_string(); // Creates a new procedure, passing it to the `spawn` function. spawn(proc() { println!("{} world!", string); }); // the variable `string` has been moved into the previous procedure, so it is // no longer usable. // Create an invoke a procedure. Note that the procedure is *moved* when // invoked, so it cannot be invoked again. let f = proc(n: int) { n + 22 }; println!("answer: {}", f(20)); ``` ### Object types Every trait item (see [traits](#traits)) defines a type with the same name as the trait. This type is called the _object type_ of the trait. Object types permit "late binding" of methods, dispatched using _virtual method tables_ ("vtables"). Whereas most calls to trait methods are "early bound" (statically resolved) to specific implementations at compile time, a call to a method on an object type is only resolved to a vtable entry at compile time. The actual implementation for each vtable entry can vary on an object-by-object basis. Given a pointer-typed expression `E` of type `&T` or `Box`, where `T` implements trait `R`, casting `E` to the corresponding pointer type `&R` or `Box` results in a value of the _object type_ `R`. This result is represented as a pair of pointers: the vtable pointer for the `T` implementation of `R`, and the pointer value of `E`. An example of an object type: ``` trait Printable { fn stringify(&self) -> String; } impl Printable for int { fn stringify(&self) -> String { self.to_string() } } fn print(a: Box) { println!("{}", a.stringify()); } fn main() { print(box 10i as Box); } ``` In this example, the trait `Printable` occurs as an object type in both the type signature of `print`, and the cast expression in `main`. ### Type parameters Within the body of an item that has type parameter declarations, the names of its type parameters are types: ```ignore fn map(f: |A| -> B, xs: &[A]) -> Vec { if xs.len() == 0 { return vec![]; } let first: B = f(xs[0].clone()); let mut rest: Vec = map(f, xs.slice(1, xs.len())); rest.insert(0, first); return rest; } ``` Here, `first` has type `B`, referring to `map`'s `B` type parameter; and `rest` has type `Vec`, a vector type with element type `B`. ### Self types The special type `self` has a meaning within methods inside an impl item. It refers to the type of the implicit `self` argument. For example, in: ``` trait Printable { fn make_string(&self) -> String; } impl Printable for String { fn make_string(&self) -> String { (*self).clone() } } ``` `self` refers to the value of type `String` that is the receiver for a call to the method `make_string`. ## Type kinds Types in Rust are categorized into kinds, based on various properties of the components of the type. The kinds are: * `Send` : Types of this kind can be safely sent between tasks. This kind includes scalars, boxes, procs, and structural types containing only other owned types. All `Send` types are `'static`. * `Copy` : Types of this kind consist of "Plain Old Data" which can be copied by simply moving bits. All values of this kind can be implicitly copied. This kind includes scalars and immutable references, as well as structural types containing other `Copy` types. * `'static` : Types of this kind do not contain any references (except for references with the `static` lifetime, which are allowed). This can be a useful guarantee for code that breaks borrowing assumptions using [`unsafe` operations](#unsafe-functions). * `Drop` : This is not strictly a kind, but its presence interacts with kinds: the `Drop` trait provides a single method `drop` that takes no parameters, and is run when values of the type are dropped. Such a method is called a "destructor", and are always executed in "top-down" order: a value is completely destroyed before any of the values it owns run their destructors. Only `Send` types can implement `Drop`. * _Default_ : Types with destructors, closure environments, and various other _non-first-class_ types, are not copyable at all. Such types can usually only be accessed through pointers, or in some cases, moved between mutable locations. Kinds can be supplied as _bounds_ on type parameters, like traits, in which case the parameter is constrained to types satisfying that kind. By default, type parameters do not carry any assumed kind-bounds at all. When instantiating a type parameter, the kind bounds on the parameter are checked to be the same or narrower than the kind of the type that it is instantiated with. Sending operations are not part of the Rust language, but are implemented in the library. Generic functions that send values bound the kind of these values to sendable. # Memory and concurrency models Rust has a memory model centered around concurrently-executing _tasks_. Thus its memory model and its concurrency model are best discussed simultaneously, as parts of each only make sense when considered from the perspective of the other. When reading about the memory model, keep in mind that it is partitioned in order to support tasks; and when reading about tasks, keep in mind that their isolation and communication mechanisms are only possible due to the ownership and lifetime semantics of the memory model. ## Memory model A Rust program's memory consists of a static set of *items*, a set of [tasks](#tasks) each with its own *stack*, and a *heap*. Immutable portions of the heap may be shared between tasks, mutable portions may not. Allocations in the stack consist of *slots*, and allocations in the heap consist of *boxes*. ### Memory allocation and lifetime The _items_ of a program are those functions, modules and types that have their value calculated at compile-time and stored uniquely in the memory image of the rust process. Items are neither dynamically allocated nor freed. A task's _stack_ consists of activation frames automatically allocated on entry to each function as the task executes. A stack allocation is reclaimed when control leaves the frame containing it. The _heap_ is a general term that describes boxes. The lifetime of an allocation in the heap depends on the lifetime of the box values pointing to it. Since box values may themselves be passed in and out of frames, or stored in the heap, heap allocations may outlive the frame they are allocated within. ### Memory ownership A task owns all memory it can *safely* reach through local variables, as well as boxes and references. When a task sends a value that has the `Send` trait to another task, it loses ownership of the value sent and can no longer refer to it. This is statically guaranteed by the combined use of "move semantics", and the compiler-checked _meaning_ of the `Send` trait: it is only instantiated for (transitively) sendable kinds of data constructor and pointers, never including references. When a stack frame is exited, its local allocations are all released, and its references to boxes are dropped. When a task finishes, its stack is necessarily empty and it therefore has no references to any boxes; the remainder of its heap is immediately freed. ### Memory slots A task's stack contains slots. A _slot_ is a component of a stack frame, either a function parameter, a [temporary](#lvalues,-rvalues-and-temporaries), or a local variable. A _local variable_ (or *stack-local* allocation) holds a value directly, allocated within the stack's memory. The value is a part of the stack frame. Local variables are immutable unless declared otherwise like: `let mut x = ...`. Function parameters are immutable unless declared with `mut`. The `mut` keyword applies only to the following parameter (so `|mut x, y|` and `fn f(mut x: Box, y: Box)` declare one mutable variable `x` and one immutable variable `y`). Methods that take either `self` or `Box` can optionally place them in a mutable slot by prefixing them with `mut` (similar to regular arguments): ``` trait Changer { fn change(mut self) -> Self; fn modify(mut self: Box) -> Box; } ``` Local variables are not initialized when allocated; the entire frame worth of local variables are allocated at once, on frame-entry, in an uninitialized state. Subsequent statements within a function may or may not initialize the local variables. Local variables can be used only after they have been initialized; this is enforced by the compiler. ### Boxes A _box_ is a reference to a heap allocation holding another value, which is constructed by the prefix operator `box`. When the standard library is in use, the type of a box is `std::owned::Box`. An example of a box type and value: ``` let x: Box = box 10; ``` Box values exist in 1:1 correspondence with their heap allocation, copying a box value makes a shallow copy of the pointer. Rust will consider a shallow copy of a box to move ownership of the value. After a value has been moved, the source location cannot be used unless it is reinitialized. ``` let x: Box = box 10; let y = x; // attempting to use `x` will result in an error here ``` ## Tasks An executing Rust program consists of a tree of tasks. A Rust _task_ consists of an entry function, a stack, a set of outgoing communication channels and incoming communication ports, and ownership of some portion of the heap of a single operating-system process. ### Communication between tasks Rust tasks are isolated and generally unable to interfere with one another's memory directly, except through [`unsafe` code](#unsafe-functions). All contact between tasks is mediated by safe forms of ownership transfer, and data races on memory are prohibited by the type system. When you wish to send data between tasks, the values are restricted to the [`Send` type-kind](#type-kinds). Restricting communication interfaces to this kind ensures that no references move between tasks. Thus access to an entire data structure can be mediated through its owning "root" value; no further locking or copying is required to avoid data races within the substructure of such a value. ### Task lifecycle The _lifecycle_ of a task consists of a finite set of states and events that cause transitions between the states. The lifecycle states of a task are: * running * blocked * panicked * dead A task begins its lifecycle — once it has been spawned — in the *running* state. In this state it executes the statements of its entry function, and any functions called by the entry function. A task may transition from the *running* state to the *blocked* state any time it makes a blocking communication call. When the call can be completed — when a message arrives at a sender, or a buffer opens to receive a message — then the blocked task will unblock and transition back to *running*. A task may transition to the *panicked* state at any time, due being killed by some external event or internally, from the evaluation of a `panic!()` macro. Once *panicking*, a task unwinds its stack and transitions to the *dead* state. Unwinding the stack of a task is done by the task itself, on its own control stack. If a value with a destructor is freed during unwinding, the code for the destructor is run, also on the task's control stack. Running the destructor code causes a temporary transition to a *running* state, and allows the destructor code to cause any subsequent state transitions. The original task of unwinding and panicking thereby may suspend temporarily, and may involve (recursive) unwinding of the stack of a failed destructor. Nonetheless, the outermost unwinding activity will continue until the stack is unwound and the task transitions to the *dead* state. There is no way to "recover" from task panics. Once a task has temporarily suspended its unwinding in the *panicking* state, a panic occurring from within this destructor results in *hard* panic. A hard panic currently results in the process aborting. A task in the *dead* state cannot transition to other states; it exists only to have its termination status inspected by other tasks, and/or to await reclamation when the last reference to it drops. # Runtime services, linkage and debugging The Rust _runtime_ is a relatively compact collection of Rust code that provides fundamental services and datatypes to all Rust tasks at run-time. It is smaller and simpler than many modern language runtimes. It is tightly integrated into the language's execution model of memory, tasks, communication and logging. ### Memory allocation The runtime memory-management system is based on a _service-provider interface_, through which the runtime requests blocks of memory from its environment and releases them back to its environment when they are no longer needed. The default implementation of the service-provider interface consists of the C runtime functions `malloc` and `free`. The runtime memory-management system, in turn, supplies Rust tasks with facilities for allocating releasing stacks, as well as allocating and freeing heap data. ### Built in types The runtime provides C and Rust code to assist with various built-in types, such as arrays, strings, and the low level communication system (ports, channels, tasks). Support for other built-in types such as simple types, tuples and enums is open-coded by the Rust compiler. ### Task scheduling and communication The runtime provides code to manage inter-task communication. This includes the system of task-lifecycle state transitions depending on the contents of queues, as well as code to copy values between queues and their recipients and to serialize values for transmission over operating-system inter-process communication facilities. ### Linkage The Rust compiler supports various methods to link crates together both statically and dynamically. This section will explore the various methods to link Rust crates together, and more information about native libraries can be found in the [ffi guide][ffi]. In one session of compilation, the compiler can generate multiple artifacts through the usage of either command line flags or the `crate_type` attribute. If one or more command line flag is specified, all `crate_type` attributes will be ignored in favor of only building the artifacts specified by command line. * `--crate-type=bin`, `#[crate_type = "bin"]` - A runnable executable will be produced. This requires that there is a `main` function in the crate which will be run when the program begins executing. This will link in all Rust and native dependencies, producing a distributable binary. * `--crate-type=lib`, `#[crate_type = "lib"]` - A Rust library will be produced. This is an ambiguous concept as to what exactly is produced because a library can manifest itself in several forms. The purpose of this generic `lib` option is to generate the "compiler recommended" style of library. The output library will always be usable by rustc, but the actual type of library may change from time-to-time. The remaining output types are all different flavors of libraries, and the `lib` type can be seen as an alias for one of them (but the actual one is compiler-defined). * `--crate-type=dylib`, `#[crate_type = "dylib"]` - A dynamic Rust library will be produced. This is different from the `lib` output type in that this forces dynamic library generation. The resulting dynamic library can be used as a dependency for other libraries and/or executables. This output type will create `*.so` files on linux, `*.dylib` files on osx, and `*.dll` files on windows. * `--crate-type=staticlib`, `#[crate_type = "staticlib"]` - A static system library will be produced. This is different from other library outputs in that the Rust compiler will never attempt to link to `staticlib` outputs. The purpose of this output type is to create a static library containing all of the local crate's code along with all upstream dependencies. The static library is actually a `*.a` archive on linux and osx and a `*.lib` file on windows. This format is recommended for use in situations such as linking Rust code into an existing non-Rust application because it will not have dynamic dependencies on other Rust code. * `--crate-type=rlib`, `#[crate_type = "rlib"]` - A "Rust library" file will be produced. This is used as an intermediate artifact and can be thought of as a "static Rust library". These `rlib` files, unlike `staticlib` files, are interpreted by the Rust compiler in future linkage. This essentially means that `rustc` will look for metadata in `rlib` files like it looks for metadata in dynamic libraries. This form of output is used to produce statically linked executables as well as `staticlib` outputs. Note that these outputs are stackable in the sense that if multiple are specified, then the compiler will produce each form of output at once without having to recompile. However, this only applies for outputs specified by the same method. If only `crate_type` attributes are specified, then they will all be built, but if one or more `--crate-type` command line flag is specified, then only those outputs will be built. With all these different kinds of outputs, if crate A depends on crate B, then the compiler could find B in various different forms throughout the system. The only forms looked for by the compiler, however, are the `rlib` format and the dynamic library format. With these two options for a dependent library, the compiler must at some point make a choice between these two formats. With this in mind, the compiler follows these rules when determining what format of dependencies will be used: 1. If a static library is being produced, all upstream dependencies are required to be available in `rlib` formats. This requirement stems from the reason that a dynamic library cannot be converted into a static format. Note that it is impossible to link in native dynamic dependencies to a static library, and in this case warnings will be printed about all unlinked native dynamic dependencies. 2. If an `rlib` file is being produced, then there are no restrictions on what format the upstream dependencies are available in. It is simply required that all upstream dependencies be available for reading metadata from. The reason for this is that `rlib` files do not contain any of their upstream dependencies. It wouldn't be very efficient for all `rlib` files to contain a copy of `libstd.rlib`! 3. If an executable is being produced and the `-C prefer-dynamic` flag is not specified, then dependencies are first attempted to be found in the `rlib` format. If some dependencies are not available in an rlib format, then dynamic linking is attempted (see below). 4. If a dynamic library or an executable that is being dynamically linked is being produced, then the compiler will attempt to reconcile the available dependencies in either the rlib or dylib format to create a final product. A major goal of the compiler is to ensure that a library never appears more than once in any artifact. For example, if dynamic libraries B and C were each statically linked to library A, then a crate could not link to B and C together because there would be two copies of A. The compiler allows mixing the rlib and dylib formats, but this restriction must be satisfied. The compiler currently implements no method of hinting what format a library should be linked with. When dynamically linking, the compiler will attempt to maximize dynamic dependencies while still allowing some dependencies to be linked in via an rlib. For most situations, having all libraries available as a dylib is recommended if dynamically linking. For other situations, the compiler will emit a warning if it is unable to determine which formats to link each library with. In general, `--crate-type=bin` or `--crate-type=lib` should be sufficient for all compilation needs, and the other options are just available if more fine-grained control is desired over the output format of a Rust crate. # Appendix: Rationales and design tradeoffs *TODO*. # Appendix: Influences and further references ## Influences > The essential problem that must be solved in making a fault-tolerant > software system is therefore that of fault-isolation. Different programmers > will write different modules, some modules will be correct, others will have > errors. We do not want the errors in one module to adversely affect the > behaviour of a module which does not have any errors. > > — Joe Armstrong > In our approach, all data is private to some process, and processes can > only communicate through communications channels. *Security*, as used > in this paper, is the property which guarantees that processes in a system > cannot affect each other except by explicit communication. > > When security is absent, nothing which can be proven about a single module > in isolation can be guaranteed to hold when that module is embedded in a > system [...] > > — Robert Strom and Shaula Yemini > Concurrent and applicative programming complement each other. The > ability to send messages on channels provides I/O without side effects, > while the avoidance of shared data helps keep concurrent processes from > colliding. > > — Rob Pike Rust is not a particularly original language. It may however appear unusual by contemporary standards, as its design elements are drawn from a number of "historical" languages that have, with a few exceptions, fallen out of favour. Five prominent lineages contribute the most, though their influences have come and gone during the course of Rust's development: * The NIL (1981) and Hermes (1990) family. These languages were developed by Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM Watson Research Center (Yorktown Heights, NY, USA). * The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes Wikström, Mike Williams and others in their group at the Ericsson Computer Science Laboratory (Älvsjö, Stockholm, Sweden) . * The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim, Heinz Schmidt and others in their group at The International Computer Science Institute of the University of California, Berkeley (Berkeley, CA, USA). * The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These languages were developed by Rob Pike, Phil Winterbottom, Sean Dorward and others in their group at Bell Labs Computing Sciences Research Center (Murray Hill, NJ, USA). * The Napier (1985) and Napier88 (1988) family. These languages were developed by Malcolm Atkinson, Ron Morrison and others in their group at the University of St. Andrews (St. Andrews, Fife, UK). Additional specific influences can be seen from the following languages: * The structural algebraic types and compilation manager of SML. * The attribute and assembly systems of C#. * The references and deterministic destructor system of C++. * The memory region systems of the ML Kit and Cyclone. * The typeclass system of Haskell. * The lexical identifier rule of Python. * The block syntax of Ruby. [ffi]: guide-ffi.html [plugin]: guide-plugin.html