diff --git a/src/doc/reference.md b/src/doc/reference.md index f5a4f12e5fa..9e477513237 100644 --- a/src/doc/reference.md +++ b/src/doc/reference.md @@ -29,41 +29,6 @@ You may also be interested in the [grammar]. # Notation -Rust's grammar is defined over Unicode code points, each conventionally denoted -`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is -confined to the ASCII range of Unicode, and is described in this document by a -dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF -supported by common automated LL(k) parsing tools such as `llgen`, rather than -the dialect given in ISO 14977. The dialect can be defined self-referentially -as follows: - -```{.ebnf .notation} -grammar : rule + ; -rule : nonterminal ':' productionrule ';' ; -productionrule : production [ '|' production ] * ; -production : term * ; -term : element repeats ; -element : LITERAL | IDENTIFIER | '[' productionrule ']' ; -repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ; -``` - -Where: - -- Whitespace in the grammar is ignored. -- Square brackets are used to group rules. -- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal - ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding - Unicode code point `U+00QQ`. -- `IDENTIFIER` is a nonempty string of ASCII letters and underscores. -- The `repeat` forms apply to the adjacent `element`, and are as follows: - - `?` means zero or one repetition - - `*` means zero or more repetitions - - `+` means one or more repetitions - - NUMBER trailing a repeat symbol gives a maximum repetition count - - NUMBER on its own gives an exact repetition count - -This EBNF dialect should hopefully be familiar to many readers. - ## Unicode productions A few productions in Rust's grammar permit Unicode code points outside the ASCII @@ -132,13 +97,6 @@ Some productions are defined by exclusion of particular Unicode characters: ## Comments -```{.ebnf .gram} -comment : block_comment | line_comment ; -block_comment : "/*" block_comment_body * "*/" ; -block_comment_body : [block_comment | character] * ; -line_comment : "//" non_eol * ; -``` - Comments in Rust code follow the general C++ style of line and block-comment forms. Nested block comments are supported. @@ -159,11 +117,6 @@ Non-doc comments are interpreted as a form of whitespace. ## Whitespace -```{.ebnf .gram} -whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ; -whitespace : [ whitespace_char | comment ] + ; -``` - The `whitespace_char` production is any nonempty Unicode string consisting of any of the following Unicode characters: `U+0020` (space, `' '`), `U+0009` (tab, `'\t'`), `U+000A` (LF, `'\n'`), `U+000D` (CR, `'\r'`). @@ -176,11 +129,6 @@ with any other legal whitespace element, such as a single space character. ## Tokens -```{.ebnf .gram} -simple_token : keyword | unop | binop ; -token : simple_token | ident | literal | symbol | whitespace token ; -``` - Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. "Simple" tokens are given in [string table production](#string-table-productions) form, and occur in the rest of the @@ -218,11 +166,6 @@ of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. A literal is a form of constant expression, so is evaluated (primarily) at compile time. -```{.ebnf .gram} -lit_suffix : ident; -literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit ] lit_suffix ?; -``` - The optional suffix is only used for certain numeric literals, but is reserved for future extension, that is, the above gives the lexical grammar, but a Rust parser will reject everything but the 12 special @@ -275,32 +218,6 @@ cases mentioned in [Number literals](#number-literals) below. #### Character and string literals -```{.ebnf .gram} -char_lit : '\x27' char_body '\x27' ; -string_lit : '"' string_body * '"' | 'r' raw_string ; - -char_body : non_single_quote - | '\x5c' [ '\x27' | common_escape | unicode_escape ] ; - -string_body : non_double_quote - | '\x5c' [ '\x22' | common_escape | unicode_escape ] ; -raw_string : '"' raw_string_body '"' | '#' raw_string '#' ; - -common_escape : '\x5c' - | 'n' | 'r' | 't' | '0' - | 'x' hex_digit 2 - -unicode_escape : 'u' '{' hex_digit+ 6 '}'; - -hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f' - | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' - | dec_digit ; -oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; -dec_digit : '0' | nonzero_dec ; -nonzero_dec: '1' | '2' | '3' | '4' - | '5' | '6' | '7' | '8' | '9' ; -``` - ##### Character literals A _character literal_ is a single Unicode character enclosed within two @@ -349,11 +266,10 @@ following forms: Raw string literals do not process any escapes. They start with the character `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`) and a -`U+0022` (double-quote) character. The _raw string body_ is not defined in the -EBNF grammar above: it can contain any sequence of Unicode characters and is -terminated only by another `U+0022` (double-quote) character, followed by the -same number of `U+0023` (`#`) characters that preceded the opening `U+0022` -(double-quote) character. +`U+0022` (double-quote) character. The _raw string body_ can contain any sequence +of Unicode characters and is terminated only by another `U+0022` (double-quote) +character, followed by the same number of `U+0023` (`#`) characters that preceded +the opening `U+0022` (double-quote) character. All Unicode characters contained in the raw string body represent themselves, the characters `U+0022` (double-quote) (except when followed by at least as @@ -375,19 +291,6 @@ r##"foo #"# bar"##; // foo #"# bar #### Byte and byte string literals -```{.ebnf .gram} -byte_lit : "b\x27" byte_body '\x27' ; -byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ; - -byte_body : ascii_non_single_quote - | '\x5c' [ '\x27' | common_escape ] ; - -byte_string_body : ascii_non_double_quote - | '\x5c' [ '\x22' | common_escape ] ; -raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ; - -``` - ##### Byte literals A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` @@ -424,11 +327,10 @@ following forms: Raw byte string literals do not process any escapes. They start with the character `U+0062` (`b`), followed by `U+0072` (`r`), followed by zero or more of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. The -_raw string body_ is not defined in the EBNF grammar above: it can contain any -sequence of ASCII characters and is terminated only by another `U+0022` -(double-quote) character, followed by the same number of `U+0023` (`#`) -characters that preceded the opening `U+0022` (double-quote) character. A raw -byte string literal can not contain any non-ASCII byte. +_raw string body_ can contain any sequence of ASCII characters and is terminated +only by another `U+0022` (double-quote) character, followed by the same number of +`U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) +character. A raw byte string literal can not contain any non-ASCII byte. All characters contained in the raw string body represent their ASCII encoding, the characters `U+0022` (double-quote) (except when followed by at least as @@ -450,19 +352,6 @@ b"\\x52"; br"\x52"; // \x52 #### Number literals -```{.ebnf .gram} -num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ? - | '0' [ [ dec_digit | '_' ] * float_suffix ? - | 'b' [ '1' | '0' | '_' ] + - | 'o' [ oct_digit | '_' ] + - | 'x' [ hex_digit | '_' ] + ] ; - -float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ; - -exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ; -dec_lit : [ dec_digit | '_' ] + ; -``` - A _number literal_ is either an _integer literal_ or a _floating-point literal_. The grammar for recognizing the two kinds of literals is mixed. @@ -540,12 +429,6 @@ The two values of the boolean type are written `true` and `false`. ### Symbols -```{.ebnf .gram} -symbol : "::" | "->" - | '#' | '[' | ']' | '(' | ')' | '{' | '}' - | ',' | ';' ; -``` - Symbols are a general class of printable [token](#tokens) that play structural roles in a variety of grammar productions. They are catalogued here for completeness as the set of remaining miscellaneous printable tokens that do not @@ -555,16 +438,6 @@ operators](#binary-operator-expressions), or [keywords](#keywords). ## Paths -```{.ebnf .gram} -expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ; -expr_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | expr_path ; - -type_path : ident [ type_path_tail ] + ; -type_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | "::" type_path ; -``` - A _path_ is a sequence of one or more path components _logically_ separated by a namespace qualifier (`::`). If a path consists of only one component, it may refer to either an [item](#items) or a [variable](#variables) in a local control @@ -660,19 +533,6 @@ Users of `rustc` can define new syntax extensions in two ways: ## Macros -```{.ebnf .gram} -expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ; -macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ; -matcher : '(' matcher * ')' | '[' matcher * ']' - | '{' matcher * '}' | '$' ident ':' ident - | '$' '(' matcher * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -transcriber : '(' transcriber * ')' | '[' transcriber * ']' - | '{' transcriber * '}' | '$' ident - | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -``` - `macro_rules` allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros" — to be distinguished from the "procedural macros" defined in [compiler plugins][plugin]. @@ -811,12 +671,6 @@ Crates contain [items](#items), each of which may have some number of ## Items -```{.ebnf .gram} -item : extern_crate_decl | use_decl | mod_item | fn_item | type_item - | struct_item | enum_item | static_item | trait_item | impl_item - | extern_block ; -``` - An _item_ is a component of a crate. Items are organized within a crate by a nested set of [modules](#modules). Every crate has a single "outermost" anonymous module; all further items within the crate have [paths](#paths) @@ -863,11 +717,6 @@ no notion of type abstraction: there are no first-class "forall" types. ### Modules -```{.ebnf .gram} -mod_item : "mod" ident ( ';' | '{' mod '}' ); -mod : item * ; -``` - A module is a container for zero or more [items](#items). A _module item_ is a module, surrounded in braces, named, and prefixed with the @@ -928,11 +777,6 @@ mod thread { ##### Extern crate declarations -```{.ebnf .gram} -extern_crate_decl : "extern" "crate" crate_name -crate_name: ident | ( string_lit "as" ident ) -``` - An _`extern crate` declaration_ specifies a dependency on an external crate. The external crate is then bound into the declaring scope as the `ident` provided in the `extern_crate_decl`. @@ -958,17 +802,6 @@ extern crate std as ruststd; // linking to 'std' under another name ##### Use declarations -```{.ebnf .gram} -use_decl : "pub" ? "use" [ path "as" ident - | path_glob ] ; - -path_glob : ident [ "::" [ path_glob - | '*' ] ] ? - | '{' path_item [ ',' path_item ] * '}' ; - -path_item : ident | "self" ; -``` - A _use declaration_ creates one or more local name bindings synonymous with some other [path](#paths). Usually a `use` declaration is used to shorten the path required to refer to a module item. These declarations may appear at the @@ -1413,10 +1246,6 @@ it were `Bar(i32)`, this is disallowed. ### Constant items -```{.ebnf .gram} -const_item : "const" ident ':' type '=' expr ';' ; -``` - A *constant item* is a named _constant value_ which is not associated with a specific memory location in the program. Constants are essentially inlined wherever they are used, meaning that they are copied directly into the relevant @@ -1453,10 +1282,6 @@ const BITS_N_STRINGS: BitsNStrings<'static> = BitsNStrings { ### Static items -```{.ebnf .gram} -static_item : "static" ident ':' type '=' expr ';' ; -``` - A *static item* is similar to a *constant*, except that it represents a precise memory location in the program. A static is never "inlined" at the usage site, and all references to it refer to the same memory location. Static items have @@ -1711,11 +1536,6 @@ impl Seq for u32 { ### External blocks -```{.ebnf .gram} -extern_block_item : "extern" '{' extern_block '}' ; -extern_block : [ foreign_fn ] * ; -``` - External blocks form the basis for Rust's foreign function interface. Declarations in an external block describe symbols in external, non-Rust libraries. @@ -1915,13 +1735,6 @@ the namespace hierarchy as it normally would. ## Attributes -```{.ebnf .gram} -attribute : '#' '!' ? '[' meta_item ']' ; -meta_item : ident [ '=' literal - | '(' meta_seq ')' ] ? ; -meta_seq : meta_item [ ',' meta_seq ] ? ; -``` - Any item declaration may have an _attribute_ applied to it. Attributes in Rust are modeled on Attributes in ECMA-335, with the syntax coming from ECMA-334 (C#). An attribute is a general, free-form metadatum that is interpreted @@ -2554,11 +2367,6 @@ in meaning to declaring the item outside the statement block. #### Variable declarations -```{.ebnf .gram} -let_decl : "let" pat [':' type ] ? [ init ] ? ';' ; -init : [ '=' ] expr ; -``` - A _variable declaration_ introduces a new set of variable, given by a pattern. The pattern may be followed by a type annotation, and/or an initializer expression. When no type annotation is given, the compiler will infer the type, or signal @@ -2659,15 +2467,6 @@ the same name. ### Structure expressions -```{.ebnf .gram} -struct_expr : expr_path '{' ident ':' expr - [ ',' ident ':' expr ] * - [ ".." expr ] '}' | - expr_path '(' expr - [ ',' expr ] * ')' | - expr_path ; -``` - There are several forms of structure expressions. A _structure expression_ consists of the [path](#paths) of a [structure item](#structures), followed by a brace-enclosed list of one or more comma-separated name-value pairs, @@ -2718,11 +2517,6 @@ Point3d {y: 0, z: 10, .. base}; ### Block expressions -```{.ebnf .gram} -block_expr : '{' [ stmt ';' | item ] * - [ expr ] '}' ; -``` - A _block expression_ is similar to a module in terms of the declarations that are possible. Each block conceptually introduces a new namespace scope. Use items can bring new names into scopes and declared items are in scope for only @@ -2745,10 +2539,6 @@ assert_eq!(5, x); ### Method-call expressions -```{.ebnf .gram} -method_call_expr : expr '.' ident paren_expr_list ; -``` - A _method call_ consists of an expression followed by a single dot, an identifier, and a parenthesized expression-list. Method calls are resolved to methods on specific traits, either statically dispatching to a method if the @@ -2757,10 +2547,6 @@ the left-hand-side expression is an indirect [trait object](#trait-objects). ### Field expressions -```{.ebnf .gram} -field_expr : expr '.' ident ; -``` - A _field expression_ consists of an expression followed by a single dot and an identifier, when not immediately followed by a parenthesized expression-list (the latter is a [method call expression](#method-call-expressions)). A field @@ -2781,12 +2567,6 @@ automatically dereferenced to make the field access possible. ### Array expressions -```{.ebnf .gram} -array_expr : '[' "mut" ? array_elems? ']' ; - -array_elems : [expr [',' expr]*] | [expr ';' expr] ; -``` - An [array](#array,-and-slice-types) _expression_ is written by enclosing zero or more comma-separated expressions of uniform type in square brackets. @@ -2803,10 +2583,6 @@ constant expression that can be evaluated at compile time, such as a ### Index expressions -```{.ebnf .gram} -idx_expr : expr '[' expr ']' ; -``` - [Array](#array,-and-slice-types)-typed expressions can be indexed by writing a square-bracket-enclosed expression (the index) after them. When the array is mutable, the resulting [lvalue](#lvalues,-rvalues-and-temporaries) can @@ -2823,13 +2599,6 @@ _panicked state_. ### Range expressions -```{.ebnf .gram} -range_expr : expr ".." expr | - expr ".." | - ".." expr | - ".." ; -``` - The `..` operator will construct an object of one of the `std::ops::Range` variants. ``` @@ -2872,10 +2641,6 @@ before the expression they apply to. ### Binary operator expressions -```{.ebnf .gram} -binop_expr : expr binop expr ; -``` - Binary operators expressions are given in terms of [operator precedence](#operator-precedence). @@ -3036,10 +2801,6 @@ An expression enclosed in parentheses evaluates to the result of the enclosed expression. Parentheses can be used to explicitly specify evaluation order within an expression. -```{.ebnf .gram} -paren_expr : '(' expr ')' ; -``` - An example of a parenthesized expression: ``` @@ -3049,12 +2810,6 @@ let x: i32 = (2 + 3) * 4; ### Call expressions -```{.ebnf .gram} -expr_list : [ expr [ ',' expr ]* ] ? ; -paren_expr_list : '(' expr_list ')' ; -call_expr : expr paren_expr_list ; -``` - A _call expression_ invokes a function, providing zero or more input variables and an optional location to move the function's output into. If the function eventually returns, then the expression completes. @@ -3070,11 +2825,6 @@ let pi: Result = "3.14".parse(); ### Lambda expressions -```{.ebnf .gram} -ident_list : [ ident [ ',' ident ]* ] ? ; -lambda_expr : '|' ident_list '|' expr ; -``` - A _lambda expression_ (sometimes called an "anonymous function expression") defines a function and denotes it as a value, in a single expression. A lambda expression is a pipe-symbol-delimited (`|`) list of identifiers followed by an @@ -3118,10 +2868,6 @@ ten_times(|j| println!("hello, {}", j)); A `loop` expression denotes an infinite loop. -```{.ebnf .gram} -loop_expr : [ lifetime ':' ] "loop" '{' block '}'; -``` - A `loop` expression may optionally have a _label_. The label is written as a lifetime preceding the loop expression, as in `'foo: loop{ }`. If a label is present, then labeled `break` and `continue` expressions nested @@ -3131,10 +2877,6 @@ expressions](#continue-expressions). ### Break expressions -```{.ebnf .gram} -break_expr : "break" [ lifetime ]; -``` - A `break` expression has an optional _label_. If the label is absent, then executing a `break` expression immediately terminates the innermost loop enclosing it. It is only permitted in the body of a loop. If the label is @@ -3143,10 +2885,6 @@ be the innermost label enclosing the `break` expression, but must enclose it. ### Continue expressions -```{.ebnf .gram} -continue_expr : "continue" [ lifetime ]; -``` - A `continue` expression has an optional _label_. If the label is absent, then executing a `continue` expression immediately terminates the current iteration of the innermost loop enclosing it, returning control to the loop *head*. In @@ -3160,10 +2898,6 @@ A `continue` expression is only permitted in the body of a loop. ### While loops -```{.ebnf .gram} -while_expr : [ lifetime ':' ] "while" no_struct_literal_expr '{' block '}' ; -``` - A `while` loop begins by evaluating the boolean loop conditional expression. If the loop conditional expression evaluates to `true`, the loop body block executes and control returns to the loop conditional expression. If the loop @@ -3187,10 +2921,6 @@ loops](#infinite-loops), [break expressions](#break-expressions), and ### For expressions -```{.ebnf .gram} -for_expr : [ lifetime ':' ] "for" pat "in" no_struct_literal_expr '{' block '}' ; -``` - A `for` expression is a syntactic construct for looping over elements provided by an implementation of `std::iter::Iterator`. @@ -3226,14 +2956,6 @@ loops](#infinite-loops), [break expressions](#break-expressions), and ### If expressions -```{.ebnf .gram} -if_expr : "if" no_struct_literal_expr '{' block '}' - else_tail ? ; - -else_tail : "else" [ if_expr | if_let_expr - | '{' block '}' ] ; -``` - An `if` expression is a conditional branch in program control. The form of an `if` expression is a condition expression, followed by a consequent block, any number of `else if` conditions and blocks, and an optional trailing `else` @@ -3246,14 +2968,6 @@ if` condition is evaluated. If all `if` and `else if` conditions evaluate to ### Match expressions -```{.ebnf .gram} -match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ; - -match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ; - -match_pat : pat [ '|' pat ] * [ "if" expr ] ? ; -``` - A `match` expression branches on a *pattern*. The exact form of matching that occurs depends on the pattern. Patterns consist of some combination of literals, destructured arrays or enum constructors, structures and tuples, @@ -3370,12 +3084,6 @@ let message = match maybe_digit { ### If let expressions -```{.ebnf .gram} -if_let_expr : "if" "let" pat '=' expr '{' block '}' - else_tail ? ; -else_tail : "else" [ if_expr | if_let_expr | '{' block '}' ] ; -``` - An `if let` expression is semantically identical to an `if` expression but in place of a condition expression it expects a refutable let statement. If the value of the expression on the right hand side of the let statement matches the pattern, the corresponding @@ -3383,10 +3091,6 @@ block will execute, otherwise flow proceeds to the first `else` block that follo ### While let loops -```{.ebnf .gram} -while_let_expr : "while" "let" pat '=' expr '{' block '}' ; -``` - A `while let` loop is semantically identical to a `while` loop but in place of a condition expression it expects a refutable let statement. If the value of the expression on the right hand side of the let statement matches the pattern, the @@ -3395,10 +3099,6 @@ Otherwise, the while expression completes. ### Return expressions -```{.ebnf .gram} -return_expr : "return" expr ? ; -``` - Return expressions are denoted with the keyword `return`. Evaluating a `return` expression moves its argument into the designated output location for the current function call, destroys the current function activation frame, and