A new matcher representation for use in parse_tt.

`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
  state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
  and end of the matcher) that are represented by an index that exceeds
  the end of the `&[TokenTree]`, which is clumsy and error-prone.

This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.

Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
  `idx` against `len` and read comments to understand what the result
  means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
  three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
  one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
  latter was never executed. There's now a single place issuing these
  errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
  `async-std-1.10.0` (an extreme example of heavy macro use) drops from
  11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
  that it no longer needs boxing, which partly accounts for the
  reduction in allocations.
- The rest of the drop in allocations is due to the removal of
  `MatcherKind`, because we no longer need to record anything for the
  parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
This commit is contained in:
Nicholas Nethercote 2022-04-01 10:19:16 +11:00
parent 2ad4eb207b
commit 88f8fbcce0
2 changed files with 298 additions and 343 deletions

View File

@ -1,7 +1,5 @@
#![feature(associated_type_bounds)] #![feature(associated_type_bounds)]
#![feature(associated_type_defaults)] #![feature(associated_type_defaults)]
#![feature(box_patterns)]
#![feature(box_syntax)]
#![feature(crate_visibility_modifier)] #![feature(crate_visibility_modifier)]
#![feature(decl_macro)] #![feature(decl_macro)]
#![feature(if_let_guard)] #![feature(if_let_guard)]

View File

@ -17,7 +17,7 @@
//! A "matcher position" (a.k.a. "position" or "mp") is a dot in the middle of a matcher, usually //! A "matcher position" (a.k.a. "position" or "mp") is a dot in the middle of a matcher, usually
//! written as a `·`. For example `· a $( a )* a b` is one, as is `a $( · a )* a b`. //! written as a `·`. For example `· a $( a )* a b` is one, as is `a $( · a )* a b`.
//! //!
//! The parser walks through the input a character at a time, maintaining a list //! The parser walks through the input a token at a time, maintaining a list
//! of threads consistent with the current position in the input string: `cur_mps`. //! of threads consistent with the current position in the input string: `cur_mps`.
//! //!
//! As it processes them, it fills up `eof_mps` with threads that would be valid if //! As it processes them, it fills up `eof_mps` with threads that would be valid if
@ -73,12 +73,13 @@
crate use NamedMatch::*; crate use NamedMatch::*;
crate use ParseResult::*; crate use ParseResult::*;
use crate::mbe::{self, SequenceRepetition, TokenTree}; use crate::mbe::{KleeneOp, TokenTree};
use rustc_ast::token::{self, DocComment, Nonterminal, Token, TokenKind}; use rustc_ast::token::{self, DocComment, Nonterminal, NonterminalKind, Token};
use rustc_parse::parser::{NtOrTt, Parser}; use rustc_parse::parser::{NtOrTt, Parser};
use rustc_session::parse::ParseSess; use rustc_session::parse::ParseSess;
use rustc_span::symbol::MacroRulesNormalizedIdent; use rustc_span::symbol::MacroRulesNormalizedIdent;
use rustc_span::Span;
use smallvec::{smallvec, SmallVec}; use smallvec::{smallvec, SmallVec};
@ -88,153 +89,88 @@
use std::borrow::Cow; use std::borrow::Cow;
use std::collections::hash_map::Entry::{Occupied, Vacant}; use std::collections::hash_map::Entry::{Occupied, Vacant};
// One element is enough to cover 95-99% of vectors for most benchmarks. Also, // One element is enough to cover 95-99% of vectors for most benchmarks. Also, vectors longer than
// vectors longer than one frequently have many elements, not just two or // one frequently have many elements, not just two or three.
// three.
type NamedMatchVec = SmallVec<[NamedMatch; 1]>; type NamedMatchVec = SmallVec<[NamedMatch; 1]>;
// This type is used a lot. Make sure it doesn't unintentionally get bigger. // This type is used a lot. Make sure it doesn't unintentionally get bigger.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))] #[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
rustc_data_structures::static_assert_size!(NamedMatchVec, 48); rustc_data_structures::static_assert_size!(NamedMatchVec, 48);
#[derive(Clone)] /// A unit within a matcher that a `MatcherPos` can refer to. Similar to (and derived from)
enum MatcherKind<'tt> { /// `mbe::TokenTree`, but designed specifically for fast and easy traversal during matching.
TopLevel, /// Notable differences to `mbe::TokenTree`:
Delimited(Box<DelimitedSubmatcher<'tt>>), /// - It is non-recursive, i.e. there is no nesting.
Sequence(Box<SequenceSubmatcher<'tt>>), /// - The end pieces of each sequence (the separator, if present, and the Kleene op) are
} /// represented explicitly, as is the very end of the matcher.
///
#[derive(Clone)] /// This means a matcher can be represented by `&[MatcherLoc]`, and traversal mostly involves
struct DelimitedSubmatcher<'tt> { /// simply incrementing the current matcher position index by one.
parent: Parent<'tt>, enum MatcherLoc<'tt> {
} Token {
token: &'tt Token,
#[derive(Clone)] },
struct SequenceSubmatcher<'tt> { Delimited,
parent: Parent<'tt>, Sequence {
seq: &'tt SequenceRepetition, op: KleeneOp,
} num_metavar_decls: usize,
idx_first_after: usize,
/// Data used to ascend from a submatcher back to its parent matcher. A subset of the fields from next_metavar: usize,
/// `MathcherPos`.
#[derive(Clone)]
struct Parent<'tt> {
tts: &'tt [TokenTree],
idx: usize,
kind: MatcherKind<'tt>,
}
/// A single matcher position, which could be within the top-level matcher, a submatcher, a
/// subsubmatcher, etc. For example:
/// ```text
/// macro_rules! m { $id:ident ( $($e:expr),* ) } => { ... }
/// <----------> second submatcher; one tt, one metavar
/// <--------------> first submatcher; three tts, zero metavars
/// <--------------------------> top-level matcher; two tts, one metavar
/// ```
struct MatcherPos<'tt> {
/// The tokens that make up the current matcher. When we are within a `Sequence` or `Delimited`
/// submatcher, this is just the contents of that submatcher.
tts: &'tt [TokenTree],
/// The "dot" position within the current submatcher, i.e. the index into `tts`. Can go one or
/// two positions past the final elements in `tts` when dealing with sequences, see
/// `parse_tt_inner` for details.
idx: usize,
/// This vector ends up with one element per metavar in the *top-level* matcher, even when this
/// `MatcherPos` is for a submatcher. Each element records token trees matched against the
/// relevant metavar by the black box parser. The element will be a `MatchedSeq` if the
/// corresponding metavar is within a sequence.
matches: Lrc<NamedMatchVec>,
/// The number of sequences this mp is within.
seq_depth: usize, seq_depth: usize,
},
SequenceKleeneOpNoSep {
op: KleeneOp,
idx_first: usize,
},
SequenceSep {
separator: &'tt Token,
},
SequenceKleeneOpAfterSep {
idx_first: usize,
},
MetaVarDecl {
span: Span,
bind: Ident,
kind: NonterminalKind,
next_metavar: usize,
seq_depth: usize,
},
Eof,
}
/// The position in `matches` of the next metavar to be matched against the source token /// A single matcher position, representing the state of matching.
/// stream. Should not be used if there are no metavars. struct MatcherPos {
match_cur: usize, /// The index into `TtParser::locs`, which represents the "dot".
idx: usize,
/// What kind of matcher we are in. For submatchers, this contains enough information to /// The matches made against metavar decls so far. On a successful match, this vector ends up
/// reconstitute a `MatcherPos` within the parent once we ascend out of the submatcher. /// with one element per metavar decl in the matcher. Each element records token trees matched
kind: MatcherKind<'tt>, /// against the relevant metavar by the black box parser. An element will be a `MatchedSeq` if
/// the corresponding metavar decl is within a sequence.
matches: Lrc<NamedMatchVec>,
} }
// This type is used a lot. Make sure it doesn't unintentionally get bigger. // This type is used a lot. Make sure it doesn't unintentionally get bigger.
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))] #[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
rustc_data_structures::static_assert_size!(MatcherPos<'_>, 64); rustc_data_structures::static_assert_size!(MatcherPos, 16);
impl<'tt> MatcherPos<'tt> { impl MatcherPos {
fn top_level(matcher: &'tt [TokenTree], empty_matches: Lrc<NamedMatchVec>) -> Self { /// Adds `m` as a named match for the `metavar_idx`-th metavar. There are only two call sites,
MatcherPos { /// and both are hot enough to be always worth inlining.
tts: matcher, #[inline(always)]
idx: 0, fn push_match(&mut self, metavar_idx: usize, seq_depth: usize, m: NamedMatch) {
matches: empty_matches,
seq_depth: 0,
match_cur: 0,
kind: MatcherKind::TopLevel,
}
}
fn empty_sequence(
parent_mp: &MatcherPos<'tt>,
seq: &'tt SequenceRepetition,
empty_matches: Lrc<NamedMatchVec>,
) -> Self {
let mut mp = MatcherPos {
tts: parent_mp.tts,
idx: parent_mp.idx + 1,
matches: parent_mp.matches.clone(), // a cheap clone
seq_depth: parent_mp.seq_depth,
match_cur: parent_mp.match_cur + seq.num_captures,
kind: parent_mp.kind.clone(), // an expensive clone
};
for idx in parent_mp.match_cur..parent_mp.match_cur + seq.num_captures {
mp.push_match(idx, MatchedSeq(empty_matches.clone()));
}
mp
}
fn sequence(
parent_mp: Box<MatcherPos<'tt>>,
seq: &'tt SequenceRepetition,
empty_matches: Lrc<NamedMatchVec>,
) -> Self {
let seq_kind = box SequenceSubmatcher {
parent: Parent { tts: parent_mp.tts, idx: parent_mp.idx, kind: parent_mp.kind },
seq,
};
let mut mp = MatcherPos {
tts: &seq.tts,
idx: 0,
matches: parent_mp.matches,
seq_depth: parent_mp.seq_depth,
match_cur: parent_mp.match_cur,
kind: MatcherKind::Sequence(seq_kind),
};
// Start with an empty vec for each metavar within the sequence. Note that `mp.seq_depth`
// must have the parent's depth at this point for these `push_match` calls to work.
for idx in mp.match_cur..mp.match_cur + seq.num_captures {
mp.push_match(idx, MatchedSeq(empty_matches.clone()));
}
mp.seq_depth += 1;
mp
}
/// Adds `m` as a named match for the `idx`-th metavar.
fn push_match(&mut self, idx: usize, m: NamedMatch) {
let matches = Lrc::make_mut(&mut self.matches); let matches = Lrc::make_mut(&mut self.matches);
match self.seq_depth { match seq_depth {
0 => { 0 => {
// We are not within a sequence. Just append `m`. // We are not within a sequence. Just append `m`.
assert_eq!(idx, matches.len()); assert_eq!(metavar_idx, matches.len());
matches.push(m); matches.push(m);
} }
_ => { _ => {
// We are within a sequence. Find the final `MatchedSeq` at the appropriate depth // We are within a sequence. Find the final `MatchedSeq` at the appropriate depth
// and append `m` to its vector. // and append `m` to its vector.
let mut curr = &mut matches[idx]; let mut curr = &mut matches[metavar_idx];
for _ in 0..self.seq_depth - 1 { for _ in 0..seq_depth - 1 {
match curr { match curr {
MatchedSeq(seq) => { MatchedSeq(seq) => {
let seq = Lrc::make_mut(seq); let seq = Lrc::make_mut(seq);
@ -255,9 +191,9 @@ fn push_match(&mut self, idx: usize, m: NamedMatch) {
} }
} }
enum EofMatcherPositions<'tt> { enum EofMatcherPositions {
None, None,
One(Box<MatcherPos<'tt>>), One(MatcherPos),
Multiple, Multiple,
} }
@ -349,63 +285,6 @@ pub(super) fn count_metavar_decls(matcher: &[TokenTree]) -> usize {
MatchedNonterminal(Lrc<Nonterminal>), MatchedNonterminal(Lrc<Nonterminal>),
} }
fn nameize<I: Iterator<Item = NamedMatch>>(
sess: &ParseSess,
matcher: &[TokenTree],
mut res: I,
) -> NamedParseResult {
// Recursively descend into each type of matcher (e.g., sequences, delimited, metavars) and make
// sure that each metavar has _exactly one_ binding. If a metavar does not have exactly one
// binding, then there is an error. If it does, then we insert the binding into the
// `NamedParseResult`.
fn n_rec<I: Iterator<Item = NamedMatch>>(
sess: &ParseSess,
tt: &TokenTree,
res: &mut I,
ret_val: &mut FxHashMap<MacroRulesNormalizedIdent, NamedMatch>,
) -> Result<(), (rustc_span::Span, String)> {
match *tt {
TokenTree::Sequence(_, ref seq) => {
for next_m in &seq.tts {
n_rec(sess, next_m, res.by_ref(), ret_val)?
}
}
TokenTree::Delimited(_, ref delim) => {
for next_m in delim.inner_tts() {
n_rec(sess, next_m, res.by_ref(), ret_val)?;
}
}
TokenTree::MetaVarDecl(span, _, None) => {
if sess.missing_fragment_specifiers.borrow_mut().remove(&span).is_some() {
return Err((span, "missing fragment specifier".to_string()));
}
}
TokenTree::MetaVarDecl(sp, bind_name, _) => match ret_val
.entry(MacroRulesNormalizedIdent::new(bind_name))
{
Vacant(spot) => {
spot.insert(res.next().unwrap());
}
Occupied(..) => return Err((sp, format!("duplicated bind name: {}", bind_name))),
},
TokenTree::Token(..) => (),
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
}
Ok(())
}
let mut ret_val = FxHashMap::default();
for tt in matcher {
match n_rec(sess, tt, res.by_ref(), &mut ret_val) {
Ok(_) => {}
Err((sp, msg)) => return Error(sp, msg),
}
}
Success(ret_val)
}
/// Performs a token equality check, ignoring syntax context (that is, an unhygienic comparison) /// Performs a token equality check, ignoring syntax context (that is, an unhygienic comparison)
fn token_name_eq(t1: &Token, t2: &Token) -> bool { fn token_name_eq(t1: &Token, t2: &Token) -> bool {
if let (Some((ident1, is_raw1)), Some((ident2, is_raw2))) = (t1.ident(), t2.ident()) { if let (Some((ident1, is_raw1)), Some((ident2, is_raw2))) = (t1.ident(), t2.ident()) {
@ -417,21 +296,24 @@ fn token_name_eq(t1: &Token, t2: &Token) -> bool {
} }
} }
// Note: the position vectors could be created and dropped within `parse_tt`, but to avoid excess // Note: the vectors could be created and dropped within `parse_tt`, but to avoid excess
// allocations we have a single vector fo each kind that is cleared and reused repeatedly. // allocations we have a single vector fo each kind that is cleared and reused repeatedly.
pub struct TtParser<'tt> { pub struct TtParser<'tt> {
macro_name: Ident, macro_name: Ident,
/// The matcher of the current rule.
locs: Vec<MatcherLoc<'tt>>,
/// The set of current mps to be processed. This should be empty by the end of a successful /// The set of current mps to be processed. This should be empty by the end of a successful
/// execution of `parse_tt_inner`. /// execution of `parse_tt_inner`.
cur_mps: Vec<Box<MatcherPos<'tt>>>, cur_mps: Vec<MatcherPos>,
/// The set of newly generated mps. These are used to replenish `cur_mps` in the function /// The set of newly generated mps. These are used to replenish `cur_mps` in the function
/// `parse_tt`. /// `parse_tt`.
next_mps: Vec<Box<MatcherPos<'tt>>>, next_mps: Vec<MatcherPos>,
/// The set of mps that are waiting for the black-box parser. /// The set of mps that are waiting for the black-box parser.
bb_mps: Vec<Box<MatcherPos<'tt>>>, bb_mps: Vec<MatcherPos>,
/// Pre-allocate an empty match array, so it can be cloned cheaply for macros with many rules /// Pre-allocate an empty match array, so it can be cloned cheaply for macros with many rules
/// that have no metavars. /// that have no metavars.
@ -442,6 +324,7 @@ impl<'tt> TtParser<'tt> {
pub(super) fn new(macro_name: Ident) -> TtParser<'tt> { pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
TtParser { TtParser {
macro_name, macro_name,
locs: vec![],
cur_mps: vec![], cur_mps: vec![],
next_mps: vec![], next_mps: vec![],
bb_mps: vec![], bb_mps: vec![],
@ -449,6 +332,99 @@ pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
} }
} }
/// Convert a `&[TokenTree]` to a `&[MatcherLoc]`. Note: this conversion happens every time the
/// macro is called, which may be many times if there are many call sites or if it is
/// recursive. This conversion is fairly cheap and the representation is sufficiently better
/// for matching than `&[TokenTree]` that it's a clear performance win even with the overhead.
/// But it might be possible to move the conversion outwards so it only occurs once per macro.
fn compute_locs(
&mut self,
sess: &ParseSess,
matcher: &'tt [TokenTree],
) -> Result<usize, (Span, String)> {
fn inner<'tt>(
sess: &ParseSess,
tts: &'tt [TokenTree],
locs: &mut Vec<MatcherLoc<'tt>>,
next_metavar: &mut usize,
seq_depth: usize,
) -> Result<(), (Span, String)> {
for tt in tts {
match tt {
TokenTree::Token(token) => {
locs.push(MatcherLoc::Token { token });
}
TokenTree::Delimited(_, delimited) => {
locs.push(MatcherLoc::Delimited);
inner(sess, &delimited.all_tts, locs, next_metavar, seq_depth)?;
}
TokenTree::Sequence(_, seq) => {
// We can't determine `idx_first_after` and construct the final
// `MatcherLoc::Sequence` until after `inner()` is called and the sequence
// end pieces are processed. So we push a dummy value (`Eof` is cheapest to
// construct) now, and overwrite it with the proper value below.
let dummy = MatcherLoc::Eof;
locs.push(dummy);
let next_metavar_orig = *next_metavar;
let op = seq.kleene.op;
let idx_first = locs.len();
let idx_seq = idx_first - 1;
inner(sess, &seq.tts, locs, next_metavar, seq_depth + 1)?;
if let Some(separator) = &seq.separator {
locs.push(MatcherLoc::SequenceSep { separator });
locs.push(MatcherLoc::SequenceKleeneOpAfterSep { idx_first });
} else {
locs.push(MatcherLoc::SequenceKleeneOpNoSep { op, idx_first });
}
// Overwrite the dummy value pushed above with the proper value.
locs[idx_seq] = MatcherLoc::Sequence {
op,
num_metavar_decls: seq.num_captures,
idx_first_after: locs.len(),
next_metavar: next_metavar_orig,
seq_depth,
};
}
&TokenTree::MetaVarDecl(span, bind, kind) => {
if let Some(kind) = kind {
locs.push(MatcherLoc::MetaVarDecl {
span,
bind,
kind,
next_metavar: *next_metavar,
seq_depth,
});
*next_metavar += 1;
} else if sess
.missing_fragment_specifiers
.borrow_mut()
.remove(&span)
.is_some()
{
// E.g. `$e` instead of `$e:expr`.
return Err((span, "missing fragment specifier".to_string()));
}
}
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
}
}
Ok(())
}
self.locs.clear();
let mut next_metavar = 0;
inner(sess, matcher, &mut self.locs, &mut next_metavar, /* seq_depth */ 0)?;
// A final entry is needed for eof.
self.locs.push(MatcherLoc::Eof);
// This is the number of metavar decls.
Ok(next_metavar)
}
/// Process the matcher positions of `cur_mps` until it is empty. In the process, this will /// Process the matcher positions of `cur_mps` until it is empty. In the process, this will
/// produce more mps in `next_mps` and `bb_mps`. /// produce more mps in `next_mps` and `bb_mps`.
/// ///
@ -458,8 +434,7 @@ pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
/// track of through the mps generated. /// track of through the mps generated.
fn parse_tt_inner( fn parse_tt_inner(
&mut self, &mut self,
sess: &ParseSess, num_metavar_decls: usize,
matcher: &[TokenTree],
token: &Token, token: &Token,
) -> Option<NamedParseResult> { ) -> Option<NamedParseResult> {
// Matcher positions that would be valid if the macro invocation was over now. Only // Matcher positions that would be valid if the macro invocation was over now. Only
@ -467,72 +442,53 @@ fn parse_tt_inner(
let mut eof_mps = EofMatcherPositions::None; let mut eof_mps = EofMatcherPositions::None;
while let Some(mut mp) = self.cur_mps.pop() { while let Some(mut mp) = self.cur_mps.pop() {
// Get the current position of the "dot" (`idx`) in `mp` and the number of token match &self.locs[mp.idx] {
// trees in the matcher (`len`). &MatcherLoc::Sequence {
let idx = mp.idx; op,
let len = mp.tts.len(); num_metavar_decls,
idx_first_after,
if idx < len { next_metavar,
// We are in the middle of a matcher. Compare the matcher's current tt against seq_depth,
// `token`. } => {
match &mp.tts[idx] { // Install an empty vec for each metavar within the sequence.
TokenTree::Sequence(_sp, seq) => { for metavar_idx in next_metavar..next_metavar + num_metavar_decls {
let op = seq.kleene.op; mp.push_match(
if op == mbe::KleeneOp::ZeroOrMore || op == mbe::KleeneOp::ZeroOrOne { metavar_idx,
// Allow for the possibility of zero matches of this sequence. seq_depth,
self.cur_mps.push(box MatcherPos::empty_sequence( MatchedSeq(self.empty_matches.clone()),
&*mp, );
&seq,
self.empty_matches.clone(),
));
} }
// Allow for the possibility of one or more matches of this sequence. if op == KleeneOp::ZeroOrMore || op == KleeneOp::ZeroOrOne {
self.cur_mps.push(box MatcherPos::sequence( // Try zero matches of this sequence, by skipping over it.
mp, self.cur_mps.push(MatcherPos {
&seq, idx: idx_first_after,
self.empty_matches.clone(), matches: mp.matches.clone(), // a cheap clone
)); });
} }
&TokenTree::MetaVarDecl(span, _, None) => { // Try one or more matches of this sequence, by entering it.
// E.g. `$e` instead of `$e:expr`. mp.idx += 1;
if sess.missing_fragment_specifiers.borrow_mut().remove(&span).is_some() { self.cur_mps.push(mp);
return Some(Error(span, "missing fragment specifier".to_string()));
} }
} MatcherLoc::MetaVarDecl { kind, .. } => {
&TokenTree::MetaVarDecl(_, _, Some(kind)) => {
// Built-in nonterminals never start with these tokens, so we can eliminate // Built-in nonterminals never start with these tokens, so we can eliminate
// them from consideration. // them from consideration. We use the span of the metavariable declaration
// // to determine any edition-specific matching behavior for non-terminals.
// We use the span of the metavariable declaration to determine any if Parser::nonterminal_may_begin_with(*kind, token) {
// edition-specific matching behavior for non-terminals.
if Parser::nonterminal_may_begin_with(kind, token) {
self.bb_mps.push(mp); self.bb_mps.push(mp);
} }
} }
MatcherLoc::Delimited => {
TokenTree::Delimited(_, delimited) => { // Entering the delimeter is trivial.
// To descend into a delimited submatcher, we update `mp` appropriately, mp.idx += 1;
// including enough information to re-ascend afterwards, and push it onto
// `cur_mps`. Later, when we reach the closing delimiter, we will recover
// the parent matcher position to ascend. Note that we use `all_tts` to
// include the open and close delimiter tokens.
let kind = MatcherKind::Delimited(box DelimitedSubmatcher {
parent: Parent { tts: mp.tts, idx: mp.idx, kind: mp.kind },
});
mp.tts = &delimited.all_tts;
mp.idx = 0;
mp.kind = kind;
self.cur_mps.push(mp); self.cur_mps.push(mp);
} }
MatcherLoc::Token { token: t } => {
TokenTree::Token(t) => { // If it's a doc comment, we just ignore it and move on to the next tt in the
// If it's a doc comment, we just ignore it and move on to the next tt in // matcher. This is a bug, but #95267 showed that existing programs rely on
// the matcher. This is a bug, but #95267 showed that existing programs // this behaviour, and changing it would require some care and a transition
// rely on this behaviour, and changing it would require some care and a // period.
// transition period.
// //
// If the token matches, we can just advance the parser. // If the token matches, we can just advance the parser.
// //
@ -542,71 +498,51 @@ fn parse_tt_inner(
mp.idx += 1; mp.idx += 1;
self.cur_mps.push(mp); self.cur_mps.push(mp);
} else if token_name_eq(&t, token) { } else if token_name_eq(&t, token) {
if let TokenKind::CloseDelim(_) = token.kind {
// Ascend out of the delimited submatcher.
debug_assert_eq!(idx, len - 1);
match mp.kind {
MatcherKind::Delimited(submatcher) => {
mp.tts = submatcher.parent.tts;
mp.idx = submatcher.parent.idx;
mp.kind = submatcher.parent.kind;
}
_ => unreachable!(),
}
}
mp.idx += 1; mp.idx += 1;
self.next_mps.push(mp); self.next_mps.push(mp);
} }
} }
&MatcherLoc::SequenceKleeneOpNoSep { op, idx_first } => {
// These cannot appear in a matcher. // We are past the end of a sequence with no separator. Try ending the
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(), // sequence. If that's not possible, `ending_mp` will fail quietly when it is
}
} else if let MatcherKind::Sequence(box SequenceSubmatcher { parent, seq }) = &mp.kind {
// We are past the end of a sequence.
// - If it has no separator, we must be only one past the end.
// - If it has a separator, we may be one past the end, in which case we must
// look for a separator. Or we may be two past the end, in which case we have
// already dealt with the separator.
debug_assert!(idx == len || idx == len + 1 && seq.separator.is_some());
if idx == len {
// Sequence matching may have finished: move the "dot" past the sequence in
// `parent`. This applies whether a separator is used or not. If sequence
// matching hasn't finished, this `new_mp` will fail quietly when it is
// processed next time around the loop. // processed next time around the loop.
let new_mp = box MatcherPos { let ending_mp = MatcherPos {
tts: parent.tts, idx: mp.idx + 1, // +1 skips the Kleene op
idx: parent.idx + 1,
matches: mp.matches.clone(), // a cheap clone matches: mp.matches.clone(), // a cheap clone
seq_depth: mp.seq_depth - 1,
match_cur: mp.match_cur,
kind: parent.kind.clone(), // an expensive clone
}; };
self.cur_mps.push(new_mp); self.cur_mps.push(ending_mp);
}
if seq.separator.is_some() && idx == len { if op != KleeneOp::ZeroOrOne {
// Look for the separator. // Try another repetition.
if seq.separator.as_ref().map_or(false, |sep| token_name_eq(token, sep)) { mp.idx = idx_first;
// The matcher has a separator, and it matches the current token. We can
// advance past the separator token.
mp.idx += 1;
self.next_mps.push(mp);
}
} else if seq.kleene.op != mbe::KleeneOp::ZeroOrOne {
// We don't need to look for a separator: either this sequence doesn't have
// one, or it does and we've already handled it. Also, we are allowed to have
// more than one repetition. Move the "dot" back to the beginning of the
// matcher and try to match again.
mp.match_cur -= seq.num_captures;
mp.idx = 0;
self.cur_mps.push(mp); self.cur_mps.push(mp);
} }
} else { }
// We are past the end of the matcher, and not in a sequence. Look for end of MatcherLoc::SequenceSep { separator } => {
// input. // We are past the end of a sequence with a separator but we haven't seen the
debug_assert_eq!(idx, len); // separator yet. Try ending the sequence. If that's not possible, `ending_mp`
// will fail quietly when it is processed next time around the loop.
let ending_mp = MatcherPos {
idx: mp.idx + 2, // +2 skips the separator and the Kleene op
matches: mp.matches.clone(), // a cheap clone
};
self.cur_mps.push(ending_mp);
if token_name_eq(token, separator) {
// The separator matches the current token. Advance past it.
mp.idx += 1;
self.next_mps.push(mp);
}
}
&MatcherLoc::SequenceKleeneOpAfterSep { idx_first } => {
// We are past the sequence separator. This can't be a `?` Kleene op, because
// they don't permit separators. Try another repetition.
mp.idx = idx_first;
self.cur_mps.push(mp);
}
MatcherLoc::Eof => {
// We are past the matcher's end, and not in a sequence. Try to end things.
debug_assert_eq!(mp.idx, self.locs.len() - 1);
if *token == token::Eof { if *token == token::Eof {
eof_mps = match eof_mps { eof_mps = match eof_mps {
EofMatcherPositions::None => EofMatcherPositions::One(mp), EofMatcherPositions::None => EofMatcherPositions::One(mp),
@ -617,17 +553,18 @@ fn parse_tt_inner(
} }
} }
} }
}
// If we reached the end of input, check that there is EXACTLY ONE possible matcher. // If we reached the end of input, check that there is EXACTLY ONE possible matcher.
// Otherwise, either the parse is ambiguous (which is an error) or there is a syntax error. // Otherwise, either the parse is ambiguous (which is an error) or there is a syntax error.
if *token == token::Eof { if *token == token::Eof {
Some(match eof_mps { Some(match eof_mps {
EofMatcherPositions::One(mut eof_mp) => { EofMatcherPositions::One(mut eof_mp) => {
assert_eq!(eof_mp.matches.len(), count_metavar_decls(matcher)); assert_eq!(eof_mp.matches.len(), num_metavar_decls);
// Need to take ownership of the matches from within the `Lrc`. // Need to take ownership of the matches from within the `Lrc`.
Lrc::make_mut(&mut eof_mp.matches); Lrc::make_mut(&mut eof_mp.matches);
let matches = Lrc::try_unwrap(eof_mp.matches).unwrap().into_iter(); let matches = Lrc::try_unwrap(eof_mp.matches).unwrap().into_iter();
nameize(sess, matcher, matches) self.nameize(matches)
} }
EofMatcherPositions::Multiple => { EofMatcherPositions::Multiple => {
Error(token.span, "ambiguity: multiple successful parses".to_string()) Error(token.span, "ambiguity: multiple successful parses".to_string())
@ -651,13 +588,18 @@ pub(super) fn parse_tt(
parser: &mut Cow<'_, Parser<'_>>, parser: &mut Cow<'_, Parser<'_>>,
matcher: &'tt [TokenTree], matcher: &'tt [TokenTree],
) -> NamedParseResult { ) -> NamedParseResult {
let num_metavar_decls = match self.compute_locs(parser.sess, matcher) {
Ok(num_metavar_decls) => num_metavar_decls,
Err((span, msg)) => return Error(span, msg),
};
// A queue of possible matcher positions. We initialize it with the matcher position in // A queue of possible matcher positions. We initialize it with the matcher position in
// which the "dot" is before the first token of the first token tree in `matcher`. // which the "dot" is before the first token of the first token tree in `matcher`.
// `parse_tt_inner` then processes all of these possible matcher positions and produces // `parse_tt_inner` then processes all of these possible matcher positions and produces
// possible next positions into `next_mps`. After some post-processing, the contents of // possible next positions into `next_mps`. After some post-processing, the contents of
// `next_mps` replenish `cur_mps` and we start over again. // `next_mps` replenish `cur_mps` and we start over again.
self.cur_mps.clear(); self.cur_mps.clear();
self.cur_mps.push(box MatcherPos::top_level(matcher, self.empty_matches.clone())); self.cur_mps.push(MatcherPos { idx: 0, matches: self.empty_matches.clone() });
loop { loop {
self.next_mps.clear(); self.next_mps.clear();
@ -665,8 +607,8 @@ pub(super) fn parse_tt(
// Process `cur_mps` until either we have finished the input or we need to get some // Process `cur_mps` until either we have finished the input or we need to get some
// parsing from the black-box parser done. // parsing from the black-box parser done.
if let Some(result) = self.parse_tt_inner(parser.sess, matcher, &parser.token) { if let Some(res) = self.parse_tt_inner(num_metavar_decls, &parser.token) {
return result; return res;
} }
// `parse_tt_inner` handled all of `cur_mps`, so it's empty. // `parse_tt_inner` handled all of `cur_mps`, so it's empty.
@ -693,8 +635,11 @@ pub(super) fn parse_tt(
(0, 1) => { (0, 1) => {
// We need to call the black-box parser to get some nonterminal. // We need to call the black-box parser to get some nonterminal.
let mut mp = self.bb_mps.pop().unwrap(); let mut mp = self.bb_mps.pop().unwrap();
if let TokenTree::MetaVarDecl(span, _, Some(kind)) = mp.tts[mp.idx] { let loc = &self.locs[mp.idx];
let match_cur = mp.match_cur; if let &MatcherLoc::MetaVarDecl {
span, kind, next_metavar, seq_depth, ..
} = loc
{
// We use the span of the metavariable declaration to determine any // We use the span of the metavariable declaration to determine any
// edition-specific matching behavior for non-terminals. // edition-specific matching behavior for non-terminals.
let nt = match parser.to_mut().parse_nonterminal(kind) { let nt = match parser.to_mut().parse_nonterminal(kind) {
@ -714,9 +659,8 @@ pub(super) fn parse_tt(
NtOrTt::Nt(nt) => MatchedNonterminal(Lrc::new(nt)), NtOrTt::Nt(nt) => MatchedNonterminal(Lrc::new(nt)),
NtOrTt::Tt(tt) => MatchedTokenTree(tt), NtOrTt::Tt(tt) => MatchedTokenTree(tt),
}; };
mp.push_match(match_cur, m); mp.push_match(next_metavar, seq_depth, m);
mp.idx += 1; mp.idx += 1;
mp.match_cur += 1;
} else { } else {
unreachable!() unreachable!()
} }
@ -737,11 +681,9 @@ fn ambiguity_error(&self, token_span: rustc_span::Span) -> NamedParseResult {
let nts = self let nts = self
.bb_mps .bb_mps
.iter() .iter()
.map(|mp| match mp.tts[mp.idx] { .map(|mp| match &self.locs[mp.idx] {
TokenTree::MetaVarDecl(_, bind, Some(kind)) => { MatcherLoc::MetaVarDecl { bind, kind, .. } => format!("{} ('{}')", kind, bind),
format!("{} ('{}')", kind, bind) _ => unreachable!(),
}
_ => panic!(),
}) })
.collect::<Vec<String>>() .collect::<Vec<String>>()
.join(" or "); .join(" or ");
@ -759,4 +701,19 @@ fn ambiguity_error(&self, token_span: rustc_span::Span) -> NamedParseResult {
), ),
) )
} }
fn nameize<I: Iterator<Item = NamedMatch>>(&self, mut res: I) -> NamedParseResult {
// Make that each metavar has _exactly one_ binding. If so, insert the binding into the
// `NamedParseResult`. Otherwise, it's an error.
let mut ret_val = FxHashMap::default();
for loc in self.locs.iter() {
if let &MatcherLoc::MetaVarDecl { span, bind, .. } = loc {
match ret_val.entry(MacroRulesNormalizedIdent::new(bind)) {
Vacant(spot) => spot.insert(res.next().unwrap()),
Occupied(..) => return Error(span, format!("duplicated bind name: {}", bind)),
};
}
}
Success(ret_val)
}
} }