19 KiB
% Macros
Introduction
Functions are the primary tool that programmers can use to build abstractions. Sometimes, however, programmers want to abstract over compile-time syntax rather than run-time values. Macros provide syntactic abstraction. For an example of how this can be useful, consider the following two code fragments, which both pattern-match on their input and both return early in one case, doing nothing otherwise:
# enum T { SpecialA(u32), SpecialB(u32) }
# fn f() -> u32 {
# let input_1 = T::SpecialA(0);
# let input_2 = T::SpecialA(0);
match input_1 {
T::SpecialA(x) => { return x; }
_ => {}
}
// ...
match input_2 {
T::SpecialB(x) => { return x; }
_ => {}
}
# return 0;
# }
This code could become tiresome if repeated many times.
However, no function can capture its functionality to make it possible
to abstract the repetition away.
Rust's macro system, however, can eliminate the repetition. Macros are
lightweight custom syntax extensions, themselves defined using the
macro_rules!
syntax extension. The following early_return
macro captures
the pattern in the above code:
# enum T { SpecialA(u32), SpecialB(u32) }
# fn f() -> u32 {
# let input_1 = T::SpecialA(0);
# let input_2 = T::SpecialA(0);
macro_rules! early_return {
($inp:expr, $sp:path) => ( // invoke it like `(input_5, SpecialE)`
match $inp {
$sp(x) => { return x; }
_ => {}
}
);
}
// ...
early_return!(input_1, T::SpecialA);
// ...
early_return!(input_2, T::SpecialB);
# return 0;
# }
# fn main() {}
Macros are defined in pattern-matching style: in the above example, the text
($inp:expr, $sp:path)
that appears on the left-hand side of the =>
is the
macro invocation syntax, a pattern denoting how to write a call to the
macro. The text on the right-hand side of the =>
, beginning with match $inp
, is the macro transcription syntax: what the macro expands to.
Invocation syntax
The macro invocation syntax specifies the syntax for the arguments to the
macro. It appears on the left-hand side of the =>
in a macro definition. It
conforms to the following rules:
- It must be surrounded by parentheses.
$
has special meaning (described below).- The
()
s,[]
s, and{}
s it contains must balance. For example,([)
is forbidden. - Some arguments can be followed only by a limited set of separators, to avoid ambiguity (described below).
Otherwise, the invocation syntax is free-form.
To take a fragment of Rust code as an argument, write $
followed by a name
(for use on the right-hand side), followed by a :
, followed by a fragment
specifier. The fragment specifier denotes the sort of fragment to match. The
most common fragment specifiers are:
ident
(an identifier, referring to a variable or item. Examples:f
,x
,foo
.)expr
(an expression. Examples:2 + 2
;if true then { 1 } else { 2 }
;f(42)
.)ty
(a type. Examples:i32
,Vec<(char, String)>
,&T
.)path
(a path to struct or enum variant. Example:T::SpecialA
)pat
(a pattern, usually appearing in amatch
or on the left-hand side of a declaration. Examples:Some(t)
;(17, 'a')
;_
.)block
(a sequence of actions. Example:{ log(error, "hi"); return 12; }
)
The parser interprets any token that's not preceded by a $
literally. Rust's usual
rules of tokenization apply,
So ($x:ident -> (($e:expr)))
, though excessively fancy, would designate a macro
that could be invoked like: my_macro!(i->(( 2+2 )))
.
To avoid ambiguity, macro invocation syntax must conform to the following rules:
expr
must be followed by=>
,,
or;
.ty
andpath
must be followed by=>
,,
,:
,=
,>
oras
.pat
must be followed by=>
,,
or=
.ident
andblock
can be followed by any token.
Invocation location
A macro invocation may take the place of (and therefore expand to) an expression, item, statement, or pattern. The Rust parser will parse the macro invocation as a "placeholder" for whichever syntactic form is appropriate for the location.
At expansion time, the output of the macro will be parsed as whichever of the three nonterminals it stands in for. This means that a single macro might, for example, expand to an item or an expression, depending on its arguments (and cause a syntax error if it is called with the wrong argument for its location). Although this behavior sounds excessively dynamic, it is known to be useful under some circumstances.
Transcription syntax
The right-hand side of the =>
follows the same rules as the left-hand side,
except that a $
need only be followed by the name of the syntactic fragment
to transcribe into the macro expansion; its type need not be repeated.
The right-hand side must be enclosed by delimiters, which the transcriber ignores.
Therefore () => ((1,2,3))
is a macro that expands to a tuple expression,
() => (let $x=$val)
is a macro that expands to a statement,
and () => (1,2,3)
is a macro that expands to a syntax error
(since the transcriber interprets the parentheses on the right-hand-size as delimiters,
and 1,2,3
is not a valid Rust expression on its own).
Except for permissibility of $name
(and $(...)*
, discussed below), the
right-hand side of a macro definition is ordinary Rust syntax. In particular,
macro invocations (including invocations of the macro currently being defined)
are permitted in expression, statement, and item locations. However, nothing
else about the code is examined or executed by the macro system; execution
still has to wait until run-time.
Interpolation location
The interpolation $argument_name
may appear in any location consistent with
its fragment specifier (i.e., if it is specified as ident
, it may be used
anywhere an identifier is permitted).
Multiplicity
Invocation
Going back to the motivating example, recall that early_return
expanded into
a match
that would return
if the match
's scrutinee matched the
"special case" identifier provided as the second argument to early_return
,
and do nothing otherwise. Now suppose that we wanted to write a
version of early_return
that could handle a variable number of "special"
cases.
The syntax $(...)*
on the left-hand side of the =>
in a macro definition
accepts zero or more occurrences of its contents. It works much
like the *
operator in regular expressions. It also supports a
separator token (a comma-separated list could be written $(...),*
), and +
instead of *
to mean "at least one."
# enum T { SpecialA(u32), SpecialB(u32), SpecialC(u32), SpecialD(u32) }
# fn f() -> u32 {
# let input_1 = T::SpecialA(0);
# let input_2 = T::SpecialA(0);
macro_rules! early_return {
($inp:expr, [ $($sp:path),+ ]) => (
match $inp {
$(
$sp(x) => { return x; }
)+
_ => {}
}
)
}
// ...
early_return!(input_1, [T::SpecialA,T::SpecialC,T::SpecialD]);
// ...
early_return!(input_2, [T::SpecialB]);
# return 0;
# }
# fn main() {}
Transcription
As the above example demonstrates, $(...)*
is also valid on the right-hand
side of a macro definition. The behavior of *
in transcription,
especially in cases where multiple *
s are nested, and multiple different
names are involved, can seem somewhat magical and unintuitive at first. The
system that interprets them is called "Macro By Example." The two rules to
keep in mind are (1) the behavior of $(...)*
is to walk through one "layer"
of repetitions for all of the $name
s it contains in lockstep, and (2) each
$name
must be under at least as many $(...)*
s as it was matched against.
If it is under more, it'll be repeated, as appropriate.
Parsing limitations
For technical reasons, there are two limitations to the treatment of syntax fragments by the macro parser:
- The parser will always parse as much as possible of a Rust syntactic
fragment. For example, if the comma were omitted from the syntax of
early_return!
above,input_1 [
would've been interpreted as the beginning of an array index. In fact, invoking the macro would have been impossible. - The parser must have eliminated all ambiguity by the time it reaches a
$name:fragment_specifier
declaration. This limitation can result in parse errors when declarations occur at the beginning of, or immediately after, a$(...)*
. For example, the grammar$($t:ty)* $e:expr
will always fail to parse because the parser would be forced to choose between parsingt
and parsinge
. Changing the invocation syntax to require a distinctive token in front can solve the problem. In the above example,$(T $t:ty)* E $e:exp
solves the problem.
Macro argument pattern matching
Motivation
Now consider code like the following:
# enum T1 { Good1(T2, u32), Bad1}
# struct T2 { body: T3 }
# enum T3 { Good2(u32), Bad2}
# fn f(x: T1) -> u32 {
match x {
T1::Good1(g1, val) => {
match g1.body {
T3::Good2(result) => {
// complicated stuff goes here
return result + val;
},
_ => panic!("Didn't get good_2")
}
}
_ => return 0 // default value
}
# }
# fn main() {}
All the complicated stuff is deeply indented, and the error-handling code is separated from matches that fail. We'd like to write a macro that performs a match, but with a syntax that suits the problem better. The following macro can solve the problem:
macro_rules! biased_match {
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
( ($e:expr) -> ($p:pat) else $err:stmt ;
binds $bind_res:ident
) => (
let $bind_res = match $e {
$p => ( $bind_res ),
_ => { $err }
};
);
// more than one name; use a tuple
( ($e:expr) -> ($p:pat) else $err:stmt ;
binds $( $bind_res:ident ),*
) => (
let ( $( $bind_res ),* ) = match $e {
$p => ( $( $bind_res ),* ),
_ => { $err }
};
)
}
# enum T1 { Good1(T2, u32), Bad1}
# struct T2 { body: T3 }
# enum T3 { Good2(u32), Bad2}
# fn f(x: T1) -> u32 {
biased_match!((x) -> (T1::Good1(g1, val)) else { return 0 };
binds g1, val );
biased_match!((g1.body) -> (T3::Good2(result) )
else { panic!("Didn't get good_2") };
binds result );
// complicated stuff goes here
return result + val;
# }
# fn main() {}
This solves the indentation problem. But if we have a lot of chained matches like this, we might prefer to write a single macro invocation. The input pattern we want is clear:
# fn main() {}
# macro_rules! b {
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
binds $( $bind_res:ident ),*
)
# => (0) }
However, it's not possible to directly expand to nested match statements. But there is a solution.
The recursive approach to macro writing
A macro may accept multiple different input grammars. The first one to successfully match the actual argument to a macro invocation is the one that "wins."
In the case of the example above, we want to write a recursive macro to process the semicolon-terminated lines, one-by-one. So, we want the following input patterns:
# macro_rules! b {
( binds $( $bind_res:ident ),* )
# => (0) }
# fn main() {}
...and:
# fn main() {}
# macro_rules! b {
( ($e :expr) -> ($p :pat) else $err :stmt ;
$( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
binds $( $bind_res:ident ),*
)
# => (0) }
The resulting macro looks like this. Note that the separation into
biased_match!
and biased_match_rec!
occurs only because we have an outer
piece of syntax (the let
) which we only want to transcribe once.
# fn main() {
macro_rules! biased_match_rec {
// Handle the first layer
( ($e :expr) -> ($p :pat) else $err :stmt ;
$( ($e_rest:expr) -> ($p_rest:pat) else $err_rest:stmt ; )*
binds $( $bind_res:ident ),*
) => (
match $e {
$p => {
// Recursively handle the next layer
biased_match_rec!($( ($e_rest) -> ($p_rest) else $err_rest ; )*
binds $( $bind_res ),*
)
}
_ => { $err }
}
);
// Produce the requested values
( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
}
// Wrap the whole thing in a `let`.
macro_rules! biased_match {
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
binds $bind_res:ident
) => (
let $bind_res = biased_match_rec!(
$( ($e) -> ($p) else $err ; )*
binds $bind_res
);
);
// more than one name: use a tuple
( $( ($e:expr) -> ($p:pat) else $err:stmt ; )*
binds $( $bind_res:ident ),*
) => (
let ( $( $bind_res ),* ) = biased_match_rec!(
$( ($e) -> ($p) else $err ; )*
binds $( $bind_res ),*
);
)
}
# enum T1 { Good1(T2, u32), Bad1}
# struct T2 { body: T3 }
# enum T3 { Good2(u32), Bad2}
# fn f(x: T1) -> u32 {
biased_match!(
(x) -> (T1::Good1(g1, val)) else { return 0 };
(g1.body) -> (T3::Good2(result) ) else { panic!("Didn't get Good2") };
binds val, result );
// complicated stuff goes here
return result + val;
# }
# }
This technique applies to many cases where transcribing a result all at once is not possible. The resulting code resembles ordinary functional programming in some respects, but has some important differences from functional programming.
The first difference is important, but also easy to forget: the transcription
(right-hand) side of a macro_rules!
rule is literal syntax, which can only
be executed at run-time. If a piece of transcription syntax does not itself
appear inside another macro invocation, it will become part of the final
program. If it is inside a macro invocation (for example, the recursive
invocation of biased_match_rec!
), it does have the opportunity to affect
transcription, but only through the process of attempted pattern matching.
The second, related, difference is that the evaluation order of macros feels
"backwards" compared to ordinary programming. Given an invocation
m1!(m2!())
, the expander first expands m1!
, giving it as input the literal
syntax m2!()
. If it transcribes its argument unchanged into an appropriate
position (in particular, not as an argument to yet another macro invocation),
the expander will then proceed to evaluate m2!()
(along with any other macro
invocations m1!(m2!())
produced).
Hygiene
To prevent clashes, rust implements hygienic macros.
As an example, loop
and for-loop
labels (discussed in the lifetimes guide)
will not clash. The following code will print "Hello!" only once:
macro_rules! loop_x {
($e: expr) => (
// $e will not interact with this 'x
'x: loop {
println!("Hello!");
$e
}
);
}
fn main() {
'x: loop {
loop_x!(break 'x);
println!("I am never printed.");
}
}
The two 'x
names did not clash, which would have caused the loop
to print "I am never printed" and to run forever.
Scoping and macro import/export
Macros are expanded at an early stage in compilation, before name resolution. One downside is that scoping works differently for macros, compared to other constructs in the language.
Definition and expansion of macros both happen in a single depth-first,
lexical-order traversal of a crate's source. So a macro defined at module scope
is visible to any subsequent code in the same module, which includes the body
of any subsequent child mod
items.
A macro defined within the body of a single fn
, or anywhere else not at
module scope, is visible only within that item.
If a module has the macro_use
attribute, its macros are also visible in its
parent module after the child's mod
item. If the parent also has macro_use
then the macros will be visible in the grandparent after the parent's mod
item, and so forth.
The macro_use
attribute can also appear on extern crate
. In this context
it controls which macros are loaded from the external crate, e.g.
#[macro_use(foo, bar)]
extern crate baz;
If the attribute is given simply as #[macro_use]
, all macros are loaded. If
there is no #[macro_use]
attribute then no macros are loaded. Only macros
defined with the #[macro_export]
attribute may be loaded.
To load a crate's macros without linking it into the output, use #[no_link]
as well.
An example:
macro_rules! m1 { () => (()) }
// visible here: m1
mod foo {
// visible here: m1
#[macro_export]
macro_rules! m2 { () => (()) }
// visible here: m1, m2
}
// visible here: m1
macro_rules! m3 { () => (()) }
// visible here: m1, m3
#[macro_use]
mod bar {
// visible here: m1, m3
macro_rules! m4 { () => (()) }
// visible here: m1, m3, m4
}
// visible here: m1, m3, m4
# fn main() { }
When this library is loaded with #[use_macros] extern crate
, only m2
will
be imported.
The Rust Reference has a listing of macro-related attributes.
The variable $crate
A further difficulty occurs when a macro is used in multiple crates. Say that
mylib
defines
pub fn increment(x: u32) -> u32 {
x + 1
}
#[macro_export]
macro_rules! inc_a {
($x:expr) => ( ::increment($x) )
}
#[macro_export]
macro_rules! inc_b {
($x:expr) => ( ::mylib::increment($x) )
}
# fn main() { }
inc_a
only works within mylib
, while inc_b
only works outside the
library. Furthermore, inc_b
will break if the user imports mylib
under
another name.
Rust does not (yet) have a hygiene system for crate references, but it does
provide a simple workaround for this problem. Within a macro imported from a
crate named foo
, the special macro variable $crate
will expand to ::foo
.
By contrast, when a macro is defined and then used in the same crate, $crate
will expand to nothing. This means we can write
#[macro_export]
macro_rules! inc {
($x:expr) => ( $crate::increment($x) )
}
# fn main() { }
to define a single macro that works both inside and outside our library. The
function name will expand to either ::increment
or ::mylib::increment
.
To keep this system simple and correct, #[macro_use] extern crate ...
may
only appear at the root of your crate, not inside mod
. This ensures that
$crate
is a single identifier.
A final note
Macros, as currently implemented, are not for the faint of heart. Even
ordinary syntax errors can be more difficult to debug when they occur inside a
macro, and errors caused by parse problems in generated code can be very
tricky. Invoking the log_syntax!
macro can help elucidate intermediate
states, invoking trace_macros!(true)
will automatically print those
intermediate states out, and passing the flag --pretty expanded
as a
command-line argument to the compiler will show the result of expansion.
If Rust's macro system can't do what you need, you may want to write a
compiler plugin instead. Compared to macro_rules!
macros, this is significantly more work, the interfaces are much less stable,
and the warnings about debugging apply ten-fold. In exchange you get the
flexibility of running arbitrary Rust code within the compiler. Syntax
extension plugins are sometimes called procedural macros for this reason.