589 lines
19 KiB
Markdown
589 lines
19 KiB
Markdown
% Macros
|
|
|
|
# Introduction
|
|
|
|
Functions are the primary tool that programmers can use to build abstractions.
|
|
Sometimes, however, programmers want to abstract over compile-time syntax
|
|
rather than run-time values.
|
|
Macros provide syntactic abstraction.
|
|
For an example of how this can be useful, consider the following two code fragments,
|
|
which both pattern-match on their input and both return early in one case,
|
|
doing nothing otherwise:
|
|
|
|
~~~~
|
|
# enum T { SpecialA(u32), SpecialB(u32) }
|
|
# fn f() -> u32 {
|
|
# let input_1 = T::SpecialA(0);
|
|
# let input_2 = T::SpecialA(0);
|
|
match input_1 {
|
|
T::SpecialA(x) => { return x; }
|
|
_ => {}
|
|
}
|
|
// ...
|
|
match input_2 {
|
|
T::SpecialB(x) => { return x; }
|
|
_ => {}
|
|
}
|
|
# return 0;
|
|
# }
|
|
~~~~
|
|
|
|
This code could become tiresome if repeated many times.
|
|
However, no function can capture its functionality to make it possible
|
|
to abstract the repetition away.
|
|
Rust's macro system, however, can eliminate the repetition. Macros are
|
|
lightweight custom syntax extensions, themselves defined using the
|
|
`macro_rules!` syntax extension. The following `early_return` macro captures
|
|
the pattern in the above code:
|
|
|
|
~~~~
|
|
# enum T { SpecialA(u32), SpecialB(u32) }
|
|
# fn f() -> u32 {
|
|
# let input_1 = T::SpecialA(0);
|
|
# let input_2 = T::SpecialA(0);
|
|
macro_rules! early_return {
|
|
($inp:expr, $sp:path) => ( // invoke it like `(input_5, SpecialE)`
|
|
match $inp {
|
|
$sp(x) => { return x; }
|
|
_ => {}
|
|
}
|
|
);
|
|
}
|
|
// ...
|
|
early_return!(input_1, T::SpecialA);
|
|
// ...
|
|
early_return!(input_2, T::SpecialB);
|
|
# return 0;
|
|
# }
|
|
# fn main() {}
|
|
~~~~
|
|
|
|
Macros are defined in pattern-matching style: in the above example, the text
|
|
`($inp:expr, $sp:path)` that appears on the left-hand side of the `=>` is the
|
|
*macro invocation syntax*, a pattern denoting how to write a call to the
|
|
macro. The text on the right-hand side of the `=>`, beginning with `match
|
|
$inp`, is the *macro transcription syntax*: what the macro expands to.
|
|
|
|
# Invocation syntax
|
|
|
|
The macro invocation syntax specifies the syntax for the arguments to the
|
|
macro. It appears on the left-hand side of the `=>` in a macro definition. It
|
|
conforms to the following rules:
|
|
|
|
1. It must be surrounded by parentheses.
|
|
2. `$` has special meaning (described below).
|
|
3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
|
|
forbidden.
|
|
4. Some arguments can be followed only by a limited set of separators, to
|
|
avoid ambiguity (described below).
|
|
|
|
Otherwise, the invocation syntax is free-form.
|
|
|
|
To take a fragment of Rust code as an argument, write `$` followed by a name
|
|
(for use on the right-hand side), followed by a `:`, followed by a *fragment
|
|
specifier*. The fragment specifier denotes the sort of fragment to match. The
|
|
most common fragment specifiers are:
|
|
|
|
* `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`,
|
|
`foo`.)
|
|
* `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`;
|
|
`f(42)`.)
|
|
* `ty` (a type. Examples: `i32`, `Vec<(char, String)>`, `&T`.)
|
|
* `path` (a path to struct or enum variant. Example: `T::SpecialA`)
|
|
* `pat` (a pattern, usually appearing in a `match` or on the left-hand side of
|
|
a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.)
|
|
* `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`)
|
|
|
|
The parser interprets any token that's not preceded by a `$` literally. Rust's usual
|
|
rules of tokenization apply,
|
|
|
|
So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro
|
|
that could be invoked like: `my_macro!(i->(( 2+2 )))`.
|
|
|
|
To avoid ambiguity, macro invocation syntax must conform to the following rules:
|
|
|
|
* `expr` must be followed by `=>`, `,` or `;`.
|
|
* `ty` and `path` must be followed by `=>`, `,`, `:`, `=`, `>` or `as`.
|
|
* `pat` must be followed by `=>`, `,` or `=`.
|
|
* `ident` and `block` can be followed by any token.
|
|
|
|
## Invocation location
|
|
|
|
A macro invocation may take the place of (and therefore expand to) an
|
|
expression, item, statement, or pattern. The Rust parser will parse the macro
|
|
invocation as a "placeholder" for whichever syntactic form is appropriate for
|
|
the location.
|
|
|
|
At expansion time, the output of the macro will be parsed as whichever of the
|
|
three nonterminals it stands in for. This means that a single macro might,
|
|
for example, expand to an item or an expression, depending on its arguments
|
|
(and cause a syntax error if it is called with the wrong argument for its
|
|
location). Although this behavior sounds excessively dynamic, it is known to
|
|
be useful under some circumstances.
|
|
|
|
|
|
# Transcription syntax
|
|
|
|
The right-hand side of the `=>` follows the same rules as the left-hand side,
|
|
except that a `$` need only be followed by the name of the syntactic fragment
|
|
to transcribe into the macro expansion; its type need not be repeated.
|
|
|
|
The right-hand side must be enclosed by delimiters, which the transcriber ignores.
|
|
Therefore `() => ((1,2,3))` is a macro that expands to a tuple expression,
|
|
`() => (let $x=$val)` is a macro that expands to a statement,
|
|
and `() => (1,2,3)` is a macro that expands to a syntax error
|
|
(since the transcriber interprets the parentheses on the right-hand-size as delimiters,
|
|
and `1,2,3` is not a valid Rust expression on its own).
|
|
|
|
Except for permissibility of `$name` (and `$(...)*`, discussed below), the
|
|
right-hand side of a macro definition is ordinary Rust syntax. In particular,
|
|
macro invocations (including invocations of the macro currently being defined)
|
|
are permitted in expression, statement, and item locations. However, nothing
|
|
else about the code is examined or executed by the macro system; execution
|
|
still has to wait until run-time.
|
|
|
|
## Interpolation location
|
|
|
|
The interpolation `$argument_name` may appear in any location consistent with
|
|
its fragment specifier (i.e., if it is specified as `ident`, it may be used
|
|
anywhere an identifier is permitted).
|
|
|
|
# Multiplicity
|
|
|
|
## Invocation
|
|
|
|
Going back to the motivating example, recall that `early_return` expanded into
|
|
a `match` that would `return` if the `match`'s scrutinee matched the
|
|
"special case" identifier provided as the second argument to `early_return`,
|
|
and do nothing otherwise. Now suppose that we wanted to write a
|
|
version of `early_return` that could handle a variable number of "special"
|
|
cases.
|
|
|
|
The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition
|
|
accepts zero or more occurrences of its contents. It works much
|
|
like the `*` operator in regular expressions. It also supports a
|
|
separator token (a comma-separated list could be written `$(...),*`), and `+`
|
|
instead of `*` to mean "at least one".
|
|
|
|
~~~~
|
|
# enum T { SpecialA(u32), SpecialB(u32), SpecialC(u32), SpecialD(u32) }
|
|
# fn f() -> u32 {
|
|
# let input_1 = T::SpecialA(0);
|
|
# let input_2 = T::SpecialA(0);
|
|
macro_rules! early_return {
|
|
($inp:expr, [ $($sp:path),+ ]) => (
|
|
match $inp {
|
|
$(
|
|
$sp(x) => { return x; }
|
|
)+
|
|
_ => {}
|
|
}
|
|
)
|
|
}
|
|
// ...
|
|
early_return!(input_1, [T::SpecialA,T::SpecialC,T::SpecialD]);
|
|
// ...
|
|
early_return!(input_2, [T::SpecialB]);
|
|
# return 0;
|
|
# }
|
|
# fn main() {}
|
|
~~~~
|
|
|
|
### Transcription
|
|
|
|
As the above example demonstrates, `$(...)*` is also valid on the right-hand
|
|
side of a macro definition. The behavior of `*` in transcription,
|
|
especially in cases where multiple `*`s are nested, and multiple different
|
|
names are involved, can seem somewhat magical and unintuitive at first. The
|
|
system that interprets them is called "Macro By Example". The two rules to
|
|
keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
|
|
of repetitions for all of the `$name`s it contains in lockstep, and (2) each
|
|
`$name` must be under at least as many `$(...)*`s as it was matched against.
|
|
If it is under more, it'll be repeated, as appropriate.
|
|
|
|
## Parsing limitations
|
|
|
|
|
|
For technical reasons, there are two limitations to the treatment of syntax
|
|
fragments by the macro parser:
|
|
|
|
1. The parser will always parse as much as possible of a Rust syntactic
|
|
fragment. For example, if the comma were omitted from the syntax of
|
|
`early_return!` above, `input_1 [` would've been interpreted as the beginning
|
|
of an array index. In fact, invoking the macro would have been impossible.
|
|
2. The parser must have eliminated all ambiguity by the time it reaches a
|
|
`$name:fragment_specifier` declaration. This limitation can result in parse
|
|
errors when declarations occur at the beginning of, or immediately after,
|
|
a `$(...)*`. For example, the grammar `$($t:ty)* $e:expr` will always fail to
|
|
parse because the parser would be forced to choose between parsing `t` and
|
|
parsing `e`. Changing the invocation syntax to require a distinctive token in
|
|
front can solve the problem. In the above example, `$(T $t:ty)* E $e:exp`
|
|
solves the problem.
|
|
|
|
# Macro argument pattern matching
|
|
|
|
## Motivation
|
|
|
|
Now consider code like the following:
|
|
|
|
~~~~
|
|
# enum T1 { Good1(T2, u32), Bad1}
|
|
# struct T2 { body: T3 }
|
|
# enum T3 { Good2(u32), Bad2}
|
|
# fn f(x: T1) -> u32 {
|
|
match x {
|
|
T1::Good1(g1, val) => {
|
|
match g1.body {
|
|
T3::Good2(result) => {
|
|
// complicated stuff goes here
|
|
return result + val;
|
|
},
|
|
_ => panic!("Didn't get good_2")
|
|
}
|
|
}
|
|
_ => return 0 // default value
|
|
}
|
|
# }
|
|
# fn main() {}
|
|
~~~~
|
|
|
|
All the complicated stuff is deeply indented, and the error-handling code is
|
|
separated from matches that fail. We'd like to write a macro that performs
|
|
a match, but with a syntax that suits the problem better. The following macro
|
|
can solve the problem:
|
|
|
|
~~~~
|
|
macro_rules! biased_match {
|
|
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
|
|
( ($e:expr) -> ($p:pat) else $err:stmt ;
|
|
binds $bind_res:ident
|
|
) => (
|
|
let $bind_res = match $e {
|
|
$p => ( $bind_res ),
|
|
_ => { $err }
|
|
};
|
|
);
|
|
// more than one name; use a tuple
|
|
( ($e:expr) -> ($p:pat) else $err:stmt ;
|
|
binds $( $bind_res:ident ),*
|
|
) => (
|
|
let ( $( $bind_res ),* ) = match $e {
|
|
$p => ( $( $bind_res ),* ),
|
|
_ => { $err }
|
|
};
|
|
)
|
|
}
|
|
|
|
# enum T1 { Good1(T2, u32), Bad1}
|
|
# struct T2 { body: T3 }
|
|
# enum T3 { Good2(u32), Bad2}
|
|
# fn f(x: T1) -> u32 {
|
|
biased_match!((x) -> (T1::Good1(g1, val)) else { return 0 };
|
|
binds g1, val );
|
|
biased_match!((g1.body) -> (T3::Good2(result) )
|
|
else { panic!("Didn't get good_2") };
|
|
binds result );
|
|
// complicated stuff goes here
|
|
return result + val;
|
|
# }
|
|
# fn main() {}
|
|
~~~~
|
|
|
|
This solves the indentation problem. But if we have a lot of chained matches
|
|
like this, we might prefer to write a single macro invocation. The input
|
|
pattern we want is clear:
|
|
|
|
~~~~
|
|
# fn main() {}
|
|
# macro_rules! b {
|
|
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
|
|
binds $( $bind_res:ident ),*
|
|
)
|
|
# => (0) }
|
|
~~~~
|
|
|
|
However, it's not possible to directly expand to nested match statements. But
|
|
there is a solution.
|
|
|
|
## The recursive approach to macro writing
|
|
|
|
A macro may accept multiple different input grammars. The first one to
|
|
successfully match the actual argument to a macro invocation is the one that
|
|
"wins".
|
|
|
|
In the case of the example above, we want to write a recursive macro to
|
|
process the semicolon-terminated lines, one-by-one. So, we want the following
|
|
input patterns:
|
|
|
|
~~~~
|
|
# macro_rules! b {
|
|
( binds $( $bind_res:ident ),* )
|
|
# => (0) }
|
|
# fn main() {}
|
|
~~~~
|
|
|
|
...and:
|
|
|
|
~~~~
|
|
# fn main() {}
|
|
# macro_rules! b {
|
|
( ($e :expr) -> ($p :pat) else $err :stmt ;
|
|
$( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
|
|
binds $( $bind_res:ident ),*
|
|
)
|
|
# => (0) }
|
|
~~~~
|
|
|
|
The resulting macro looks like this. Note that the separation into
|
|
`biased_match!` and `biased_match_rec!` occurs only because we have an outer
|
|
piece of syntax (the `let`) which we only want to transcribe once.
|
|
|
|
~~~~
|
|
# fn main() {
|
|
|
|
macro_rules! biased_match_rec {
|
|
// Handle the first layer
|
|
( ($e :expr) -> ($p :pat) else $err :stmt ;
|
|
$( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
|
|
binds $( $bind_res:ident ),*
|
|
) => (
|
|
match $e {
|
|
$p => {
|
|
// Recursively handle the next layer
|
|
biased_match_rec!($( ($e_rest) -> ($p_rest) else $err_rest ; )*
|
|
binds $( $bind_res ),*
|
|
)
|
|
}
|
|
_ => { $err }
|
|
}
|
|
);
|
|
// Produce the requested values
|
|
( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
|
|
}
|
|
|
|
// Wrap the whole thing in a `let`.
|
|
macro_rules! biased_match {
|
|
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
|
|
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
|
|
binds $bind_res:ident
|
|
) => (
|
|
let $bind_res = biased_match_rec!(
|
|
$( ($e) -> ($p) else $err ; )*
|
|
binds $bind_res
|
|
);
|
|
);
|
|
// more than one name: use a tuple
|
|
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
|
|
binds $( $bind_res:ident ),*
|
|
) => (
|
|
let ( $( $bind_res ),* ) = biased_match_rec!(
|
|
$( ($e) -> ($p) else $err ; )*
|
|
binds $( $bind_res ),*
|
|
);
|
|
)
|
|
}
|
|
|
|
|
|
# enum T1 { Good1(T2, u32), Bad1}
|
|
# struct T2 { body: T3 }
|
|
# enum T3 { Good2(u32), Bad2}
|
|
# fn f(x: T1) -> u32 {
|
|
biased_match!(
|
|
(x) -> (T1::Good1(g1, val)) else { return 0 };
|
|
(g1.body) -> (T3::Good2(result) ) else { panic!("Didn't get Good2") };
|
|
binds val, result );
|
|
// complicated stuff goes here
|
|
return result + val;
|
|
# }
|
|
# }
|
|
~~~~
|
|
|
|
This technique applies to many cases where transcribing a result all at once is not possible.
|
|
The resulting code resembles ordinary functional programming in some respects,
|
|
but has some important differences from functional programming.
|
|
|
|
The first difference is important, but also easy to forget: the transcription
|
|
(right-hand) side of a `macro_rules!` rule is literal syntax, which can only
|
|
be executed at run-time. If a piece of transcription syntax does not itself
|
|
appear inside another macro invocation, it will become part of the final
|
|
program. If it is inside a macro invocation (for example, the recursive
|
|
invocation of `biased_match_rec!`), it does have the opportunity to affect
|
|
transcription, but only through the process of attempted pattern matching.
|
|
|
|
The second, related, difference is that the evaluation order of macros feels
|
|
"backwards" compared to ordinary programming. Given an invocation
|
|
`m1!(m2!())`, the expander first expands `m1!`, giving it as input the literal
|
|
syntax `m2!()`. If it transcribes its argument unchanged into an appropriate
|
|
position (in particular, not as an argument to yet another macro invocation),
|
|
the expander will then proceed to evaluate `m2!()` (along with any other macro
|
|
invocations `m1!(m2!())` produced).
|
|
|
|
# Hygiene
|
|
|
|
To prevent clashes, rust implements
|
|
[hygienic macros](http://en.wikipedia.org/wiki/Hygienic_macro).
|
|
|
|
As an example, `loop` and `for-loop` labels (discussed in the lifetimes guide)
|
|
will not clash. The following code will print "Hello!" only once:
|
|
|
|
~~~
|
|
macro_rules! loop_x {
|
|
($e: expr) => (
|
|
// $e will not interact with this 'x
|
|
'x: loop {
|
|
println!("Hello!");
|
|
$e
|
|
}
|
|
);
|
|
}
|
|
|
|
fn main() {
|
|
'x: loop {
|
|
loop_x!(break 'x);
|
|
println!("I am never printed.");
|
|
}
|
|
}
|
|
~~~
|
|
|
|
The two `'x` names did not clash, which would have caused the loop
|
|
to print "I am never printed" and to run forever.
|
|
|
|
# Scoping and macro import/export
|
|
|
|
Macros are expanded at an early stage in compilation, before name resolution.
|
|
One downside is that scoping works differently for macros, compared to other
|
|
constructs in the language.
|
|
|
|
Definition and expansion of macros both happen in a single depth-first,
|
|
lexical-order traversal of a crate's source. So a macro defined at module scope
|
|
is visible to any subsequent code in the same module, which includes the body
|
|
of any subsequent child `mod` items.
|
|
|
|
A macro defined within the body of a single `fn`, or anywhere else not at
|
|
module scope, is visible only within that item.
|
|
|
|
If a module has the `macro_use` attribute, its macros are also visible in its
|
|
parent module after the child's `mod` item. If the parent also has `macro_use`
|
|
then the macros will be visible in the grandparent after the parent's `mod`
|
|
item, and so forth.
|
|
|
|
The `macro_use` attribute can also appear on `extern crate`. In this context
|
|
it controls which macros are loaded from the external crate, e.g.
|
|
|
|
```rust,ignore
|
|
#[macro_use(foo, bar)]
|
|
extern crate baz;
|
|
```
|
|
|
|
If the attribute is given simply as `#[macro_use]`, all macros are loaded. If
|
|
there is no `#[macro_use]` attribute then no macros are loaded. Only macros
|
|
defined with the `#[macro_export]` attribute may be loaded.
|
|
|
|
To load a crate's macros *without* linking it into the output, use `#[no_link]`
|
|
as well.
|
|
|
|
An example:
|
|
|
|
```rust
|
|
macro_rules! m1 { () => (()) }
|
|
|
|
// visible here: m1
|
|
|
|
mod foo {
|
|
// visible here: m1
|
|
|
|
#[macro_export]
|
|
macro_rules! m2 { () => (()) }
|
|
|
|
// visible here: m1, m2
|
|
}
|
|
|
|
// visible here: m1
|
|
|
|
macro_rules! m3 { () => (()) }
|
|
|
|
// visible here: m1, m3
|
|
|
|
#[macro_use]
|
|
mod bar {
|
|
// visible here: m1, m3
|
|
|
|
macro_rules! m4 { () => (()) }
|
|
|
|
// visible here: m1, m3, m4
|
|
}
|
|
|
|
// visible here: m1, m3, m4
|
|
# fn main() { }
|
|
```
|
|
|
|
When this library is loaded with `#[use_macros] extern crate`, only `m2` will
|
|
be imported.
|
|
|
|
The Rust Reference has a [listing of macro-related
|
|
attributes](../reference.html#macro--and-plugin-related-attributes).
|
|
|
|
# The variable `$crate`
|
|
|
|
A further difficulty occurs when a macro is used in multiple crates. Say that
|
|
`mylib` defines
|
|
|
|
```rust
|
|
pub fn increment(x: u32) -> u32 {
|
|
x + 1
|
|
}
|
|
|
|
#[macro_export]
|
|
macro_rules! inc_a {
|
|
($x:expr) => ( ::increment($x) )
|
|
}
|
|
|
|
#[macro_export]
|
|
macro_rules! inc_b {
|
|
($x:expr) => ( ::mylib::increment($x) )
|
|
}
|
|
# fn main() { }
|
|
```
|
|
|
|
`inc_a` only works within `mylib`, while `inc_b` only works outside the
|
|
library. Furthermore, `inc_b` will break if the user imports `mylib` under
|
|
another name.
|
|
|
|
Rust does not (yet) have a hygiene system for crate references, but it does
|
|
provide a simple workaround for this problem. Within a macro imported from a
|
|
crate named `foo`, the special macro variable `$crate` will expand to `::foo`.
|
|
By contrast, when a macro is defined and then used in the same crate, `$crate`
|
|
will expand to nothing. This means we can write
|
|
|
|
```rust
|
|
#[macro_export]
|
|
macro_rules! inc {
|
|
($x:expr) => ( $crate::increment($x) )
|
|
}
|
|
# fn main() { }
|
|
```
|
|
|
|
to define a single macro that works both inside and outside our library. The
|
|
function name will expand to either `::increment` or `::mylib::increment`.
|
|
|
|
To keep this system simple and correct, `#[macro_use] extern crate ...` may
|
|
only appear at the root of your crate, not inside `mod`. This ensures that
|
|
`$crate` is a single identifier.
|
|
|
|
# A final note
|
|
|
|
Macros, as currently implemented, are not for the faint of heart. Even
|
|
ordinary syntax errors can be more difficult to debug when they occur inside a
|
|
macro, and errors caused by parse problems in generated code can be very
|
|
tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
|
|
states, invoking `trace_macros!(true)` will automatically print those
|
|
intermediate states out, and passing the flag `--pretty expanded` as a
|
|
command-line argument to the compiler will show the result of expansion.
|
|
|
|
If Rust's macro system can't do what you need, you may want to write a
|
|
[compiler plugin](plugins.html) instead. Compared to `macro_rules!`
|
|
macros, this is significantly more work, the interfaces are much less stable,
|
|
and the warnings about debugging apply ten-fold. In exchange you get the
|
|
flexibility of running arbitrary Rust code within the compiler. Syntax
|
|
extension plugins are sometimes called *procedural macros* for this reason.
|