% Grammar # Introduction This document is the primary reference for the Rust programming language grammar. It provides only one kind of material: - Chapters that formally define the language grammar. This document does not serve as an introduction to the language. Background familiarity with the language is assumed. A separate [guide] is available to help acquire such background familiarity. This document also does not serve as a reference to the [standard] library included in the language distribution. Those libraries are documented separately by extracting documentation attributes from their source code. Many of the features that one might expect to be language features are library features in Rust, so what you're looking for may be there, not here. [guide]: guide.html [standard]: std/index.html # Notation Rust's grammar is defined over Unicode codepoints, each conventionally denoted `U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is confined to the ASCII range of Unicode, and is described in this document by a dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF supported by common automated LL(k) parsing tools such as `llgen`, rather than the dialect given in ISO 14977. The dialect can be defined self-referentially as follows: ```antlr grammar : rule + ; rule : nonterminal ':' productionrule ';' ; productionrule : production [ '|' production ] * ; production : term * ; term : element repeats ; element : LITERAL | IDENTIFIER | '[' productionrule ']' ; repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ; ``` Where: - Whitespace in the grammar is ignored. - Square brackets are used to group rules. - `LITERAL` is a single printable ASCII character, or an escaped hexadecimal ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding Unicode codepoint `U+00QQ`. - `IDENTIFIER` is a nonempty string of ASCII letters and underscores. - The `repeat` forms apply to the adjacent `element`, and are as follows: - `?` means zero or one repetition - `*` means zero or more repetitions - `+` means one or more repetitions - NUMBER trailing a repeat symbol gives a maximum repetition count - NUMBER on its own gives an exact repetition count This EBNF dialect should hopefully be familiar to many readers. ## Unicode productions A few productions in Rust's grammar permit Unicode codepoints outside the ASCII range. We define these productions in terms of character properties specified in the Unicode standard, rather than in terms of ASCII-range codepoints. The section [Special Unicode Productions](#special-unicode-productions) lists these productions. ## String table productions Some rules in the grammar — notably [unary operators](#unary-operator-expressions), [binary operators](#binary-operator-expressions), and [keywords](#keywords) — are given in a simplified form: as a listing of a table of unquoted, printable whitespace-separated strings. These cases form a subset of the rules regarding the [token](#tokens) rule, and are assumed to be the result of a lexical-analysis phase feeding the parser, driven by a DFA, operating over the disjunction of all such string table entries. When such a string enclosed in double-quotes (`"`) occurs inside the grammar, it is an implicit reference to a single member of such a string table production. See [tokens](#tokens) for more information. # Lexical structure ## Input format Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8. Most Rust grammar rules are defined in terms of printable ASCII-range codepoints, but a small number are defined in terms of Unicode properties or explicit codepoint lists. [^inputformat] [^inputformat]: Substitute definitions for the special Unicode productions are provided to the grammar verifier, restricted to ASCII range, when verifying the grammar in this document. ## Special Unicode Productions The following productions in the Rust grammar are defined in terms of Unicode properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and `non_double_quote`. ### Identifiers The `ident` production is any nonempty Unicode[^non_ascii_idents] string of the following form: [^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature gated. This is expected to improve soon. - The first character has property `XID_start` - The remaining characters have property `XID_continue` that does _not_ occur in the set of [keywords](#keywords). > **Note**: `XID_start` and `XID_continue` as character properties cover the > character ranges used to form the more familiar C and Java language-family > identifiers. ### Delimiter-restricted productions Some productions are defined by exclusion of particular Unicode characters: - `non_null` is any single Unicode character aside from `U+0000` (null) - `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`) - `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`) - `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`) ## Comments ```antlr comment : block_comment | line_comment ; block_comment : "/*" block_comment_body * "*/" ; block_comment_body : [block_comment | character] * ; line_comment : "//" non_eol * ; ``` **FIXME:** add doc grammar? ## Whitespace ```antlr whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ; whitespace : [ whitespace_char | comment ] + ; ``` ## Tokens ```antlr simple_token : keyword | unop | binop ; token : simple_token | ident | literal | symbol | whitespace token ; ``` ### Keywords
| | | | | | |----------|----------|----------|----------|---------| | abstract | alignof | as | become | box | | break | const | continue | crate | do | | else | enum | extern | false | final | | fn | for | if | impl | in | | let | loop | macro | match | mod | | move | mut | offsetof | override | priv | | proc | pub | pure | ref | return | | Self | self | sizeof | static | struct | | super | trait | true | type | typeof | | unsafe | unsized | use | virtual | where | | while | yield | | | | Each of these keywords has special meaning in its grammar, and all of them are excluded from the `ident` rule. ### Literals ```antlr lit_suffix : ident; literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?; ``` The optional `lit_suffix` production is only used for certain numeric literals, but is reserved for future extension. That is, the above gives the lexical grammar, but a Rust parser will reject everything but the 12 special cases mentioned in [Number literals](reference.html#number-literals) in the reference. #### Character and string literals ```antlr char_lit : '\x27' char_body '\x27' ; string_lit : '"' string_body * '"' | 'r' raw_string ; char_body : non_single_quote | '\x5c' [ '\x27' | common_escape | unicode_escape ] ; string_body : non_double_quote | '\x5c' [ '\x22' | common_escape | unicode_escape ] ; raw_string : '"' raw_string_body '"' | '#' raw_string '#' ; common_escape : '\x5c' | 'n' | 'r' | 't' | '0' | 'x' hex_digit 2 unicode_escape : 'u' '{' hex_digit+ 6 '}'; hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | dec_digit ; oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; dec_digit : '0' | nonzero_dec ; nonzero_dec: '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ; ``` #### Byte and byte string literals ```antlr byte_lit : "b\x27" byte_body '\x27' ; byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ; byte_body : ascii_non_single_quote | '\x5c' [ '\x27' | common_escape ] ; byte_string_body : ascii_non_double_quote | '\x5c' [ '\x22' | common_escape ] ; raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ; ``` #### Number literals ```antlr num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ? | '0' [ [ dec_digit | '_' ] * float_suffix ? | 'b' [ '1' | '0' | '_' ] + | 'o' [ oct_digit | '_' ] + | 'x' [ hex_digit | '_' ] + ] ; float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ; exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ; dec_lit : [ dec_digit | '_' ] + ; ``` #### Boolean literals ```antlr bool_lit : [ "true" | "false" ] ; ``` The two values of the boolean type are written `true` and `false`. ### Symbols ```antlr symbol : "::" | "->" | '#' | '[' | ']' | '(' | ')' | '{' | '}' | ',' | ';' ; ``` Symbols are a general class of printable [token](#tokens) that play structural roles in a variety of grammar productions. They are catalogued here for completeness as the set of remaining miscellaneous printable tokens that do not otherwise appear as [unary operators](#unary-operator-expressions), [binary operators](#binary-operator-expressions), or [keywords](#keywords). ## Paths ```antlr expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ; expr_path_tail : '<' type_expr [ ',' type_expr ] + '>' | expr_path ; type_path : ident [ type_path_tail ] + ; type_path_tail : '<' type_expr [ ',' type_expr ] + '>' | "::" type_path ; ``` # Syntax extensions ## Macros ```antlr expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';' | "macro_rules" '!' ident '{' macro_rule * '}' ; macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ; matcher : '(' matcher * ')' | '[' matcher * ']' | '{' matcher * '}' | '$' ident ':' ident | '$' '(' matcher * ')' sep_token? [ '*' | '+' ] | non_special_token ; transcriber : '(' transcriber * ')' | '[' transcriber * ']' | '{' transcriber * '}' | '$' ident | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ] | non_special_token ; ``` # Crates and source files **FIXME:** grammar? What production covers #![crate_id = "foo"] ? # Items and attributes **FIXME:** grammar? ## Items ```antlr item : vis ? mod_item | fn_item | type_item | struct_item | enum_item | const_item | static_item | trait_item | impl_item | extern_block ; ``` ### Type Parameters **FIXME:** grammar? ### Modules ```antlr mod_item : "mod" ident ( ';' | '{' mod '}' ); mod : [ view_item | item ] * ; ``` #### View items ```antlr view_item : extern_crate_decl | use_decl ';' ; ``` ##### Extern crate declarations ```antlr extern_crate_decl : "extern" "crate" crate_name crate_name: ident | ( ident "as" ident ) ``` ##### Use declarations ```antlr use_decl : vis ? "use" [ path "as" ident | path_glob ] ; path_glob : ident [ "::" [ path_glob | '*' ] ] ? | '{' path_item [ ',' path_item ] * '}' ; path_item : ident | "self" ; ``` ### Functions **FIXME:** grammar? #### Generic functions **FIXME:** grammar? #### Unsafety **FIXME:** grammar? ##### Unsafe functions **FIXME:** grammar? ##### Unsafe blocks **FIXME:** grammar? #### Diverging functions **FIXME:** grammar? ### Type definitions **FIXME:** grammar? ### Structures **FIXME:** grammar? ### Enumerations **FIXME:** grammar? ### Constant items ```antlr const_item : "const" ident ':' type '=' expr ';' ; ``` ### Static items ```antlr static_item : "static" ident ':' type '=' expr ';' ; ``` #### Mutable statics **FIXME:** grammar? ### Traits **FIXME:** grammar? ### Implementations **FIXME:** grammar? ### External blocks ```antlr extern_block_item : "extern" '{' extern_block '}' ; extern_block : [ foreign_fn ] * ; ``` ## Visibility and Privacy ```antlr vis : "pub" ; ``` ### Re-exporting and Visibility See [Use declarations](#use-declarations). ## Attributes ```antlr attribute : '#' '!' ? '[' meta_item ']' ; meta_item : ident [ '=' literal | '(' meta_seq ')' ] ? ; meta_seq : meta_item [ ',' meta_seq ] ? ; ``` # Statements and expressions ## Statements ```antlr stmt : decl_stmt | expr_stmt ; ``` ### Declaration statements ```antlr decl_stmt : item | let_decl ; ``` #### Item declarations See [Items](#items). #### Variable declarations ```antlr let_decl : "let" pat [':' type ] ? [ init ] ? ';' ; init : [ '=' ] expr ; ``` ### Expression statements ```antlr expr_stmt : expr ';' ; ``` ## Expressions ```antlr expr : literal | path | tuple_expr | unit_expr | struct_expr | block_expr | method_call_expr | field_expr | array_expr | idx_expr | range_expr | unop_expr | binop_expr | paren_expr | call_expr | lambda_expr | while_expr | loop_expr | break_expr | continue_expr | for_expr | if_expr | match_expr | if_let_expr | while_let_expr | return_expr ; ``` #### Lvalues, rvalues and temporaries **FIXME:** grammar? #### Moved and copied types **FIXME:** Do we want to capture this in the grammar as different productions? ### Literal expressions See [Literals](#literals). ### Path expressions See [Paths](#paths). ### Tuple expressions ```antlr tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ; ``` ### Unit expressions ```antlr unit_expr : "()" ; ``` ### Structure expressions ```antlr struct_expr : expr_path '{' ident ':' expr [ ',' ident ':' expr ] * [ ".." expr ] '}' | expr_path '(' expr [ ',' expr ] * ')' | expr_path ; ``` ### Block expressions ```antlr block_expr : '{' [ stmt ';' | item ] * [ expr ] '}' ; ``` ### Method-call expressions ```antlr method_call_expr : expr '.' ident paren_expr_list ; ``` ### Field expressions ```antlr field_expr : expr '.' ident ; ``` ### Array expressions ```antlr array_expr : '[' "mut" ? array_elems? ']' ; array_elems : [expr [',' expr]*] | [expr ';' expr] ; ``` ### Index expressions ```antlr idx_expr : expr '[' expr ']' ; ``` ### Range expressions ```antlr range_expr : expr ".." expr | expr ".." | ".." expr | ".." ; ``` ### Unary operator expressions ```antlr unop_expr : unop expr ; unop : '-' | '*' | '!' ; ``` ### Binary operator expressions ```antlr binop_expr : expr binop expr | type_cast_expr | assignment_expr | compound_assignment_expr ; binop : arith_op | bitwise_op | lazy_bool_op | comp_op ``` #### Arithmetic operators ```antlr arith_op : '+' | '-' | '*' | '/' | '%' ; ``` #### Bitwise operators ```antlr bitwise_op : '&' | '|' | '^' | "<<" | ">>" ; ``` #### Lazy boolean operators ```antlr lazy_bool_op : "&&" | "||" ; ``` #### Comparison operators ```antlr comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ; ``` #### Type cast expressions ```antlr type_cast_expr : value "as" type ; ``` #### Assignment expressions ```antlr assignment_expr : expr '=' expr ; ``` #### Compound assignment expressions ```antlr compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ; ``` ### Grouped expressions ```antlr paren_expr : '(' expr ')' ; ``` ### Call expressions ```antlr expr_list : [ expr [ ',' expr ]* ] ? ; paren_expr_list : '(' expr_list ')' ; call_expr : expr paren_expr_list ; ``` ### Lambda expressions ```antlr ident_list : [ ident [ ',' ident ]* ] ? ; lambda_expr : '|' ident_list '|' expr ; ``` ### While loops ```antlr while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ; ``` ### Infinite loops ```antlr loop_expr : [ lifetime ':' ] "loop" '{' block '}'; ``` ### Break expressions ```antlr break_expr : "break" [ lifetime ]; ``` ### Continue expressions ```antlr continue_expr : "continue" [ lifetime ]; ``` ### For expressions ```antlr for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ; ``` ### If expressions ```antlr if_expr : "if" no_struct_literal_expr '{' block '}' else_tail ? ; else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ; ``` ### Match expressions ```antlr match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ; match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ; match_pat : pat [ '|' pat ] * [ "if" expr ] ? ; ``` ### If let expressions ```antlr if_let_expr : "if" "let" pat '=' expr '{' block '}' else_tail ? ; else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ; ``` ### While let loops ```antlr while_let_expr : "while" "let" pat '=' expr '{' block '}' ; ``` ### Return expressions ```antlr return_expr : "return" expr ? ; ``` # Type system **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already? ## Types ### Primitive types **FIXME:** grammar? #### Machine types **FIXME:** grammar? #### Machine-dependent integer types **FIXME:** grammar? ### Textual types **FIXME:** grammar? ### Tuple types **FIXME:** grammar? ### Array, and Slice types **FIXME:** grammar? ### Structure types **FIXME:** grammar? ### Enumerated types **FIXME:** grammar? ### Pointer types **FIXME:** grammar? ### Function types **FIXME:** grammar? ### Closure types ```antlr closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|' [ ':' bound-list ] [ '->' type ] procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')' [ ':' bound-list ] [ '->' type ] lifetime-list := lifetime | lifetime ',' lifetime-list arg-list := ident ':' type | ident ':' type ',' arg-list bound-list := bound | bound '+' bound-list bound := path | lifetime ``` ### Object types **FIXME:** grammar? ### Type parameters **FIXME:** grammar? ### Self types **FIXME:** grammar? ## Type kinds **FIXME:** this this probably not relevant to the grammar... # Memory and concurrency models **FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already? ## Memory model ### Memory allocation and lifetime ### Memory ownership ### Variables ### Boxes ## Threads ### Communication between threads ### Thread lifecycle