A new matcher representation for use in parse_tt
.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this is a bad representation for the traversal. - `TokenTree` is nested, and there's a bunch of expensive and fiddly state required to handle entering and exiting nested submatchers. - There are three positions (sequence separators, sequence Kleene ops, and end of the matcher) that are represented by an index that exceeds the end of the `&[TokenTree]`, which is clumsy and error-prone. This commit introduces a new representation called `MatcherLoc` that is designed specifically for matching. It fixes all the above problems, making the code much easier to read. A `&[TokenTree]` is converted to a `&[MatcherLoc]` before matching begins. Despite the cost of the conversion, it's still a net performance win, because various pieces of traversal state are computed once up-front, rather than having to be recomputed repeatedly during the macro matching. Some improvements worth noting. - `parse_tt_inner` is *much* easier to read. No more having to compare `idx` against `len` and read comments to understand what the result means. - The handling of `Delimited` in `parse_tt_inner` is now trivial. - The three end-of-sequence cases in `parse_tt_inner` are now handled in three separate match arms, and the control flow is much simpler. - `nameize` is no longer recursive. - There were two places that issued "missing fragment specifier" errors: one in `parse_tt_inner()`, and one in `nameize()`. Presumably the latter was never executed. There's now a single place issuing these errors, in `compute_locs()`. - The number of heap allocations done for a `check full` build of `async-std-1.10.0` (an extreme example of heavy macro use) drops from 11.8M to 2.6M, and most of these occur outside of macro matching. - The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough that it no longer needs boxing, which partly accounts for the reduction in allocations. - The rest of the drop in allocations is due to the removal of `MatcherKind`, because we no longer need to record anything for the parent matcher when entering a submatcher. - Overall it reduces code size by 45 lines.
This commit is contained in:
parent
2ad4eb207b
commit
88f8fbcce0
@ -1,7 +1,5 @@
|
||||
#![feature(associated_type_bounds)]
|
||||
#![feature(associated_type_defaults)]
|
||||
#![feature(box_patterns)]
|
||||
#![feature(box_syntax)]
|
||||
#![feature(crate_visibility_modifier)]
|
||||
#![feature(decl_macro)]
|
||||
#![feature(if_let_guard)]
|
||||
|
@ -17,7 +17,7 @@
|
||||
//! A "matcher position" (a.k.a. "position" or "mp") is a dot in the middle of a matcher, usually
|
||||
//! written as a `·`. For example `· a $( a )* a b` is one, as is `a $( · a )* a b`.
|
||||
//!
|
||||
//! The parser walks through the input a character at a time, maintaining a list
|
||||
//! The parser walks through the input a token at a time, maintaining a list
|
||||
//! of threads consistent with the current position in the input string: `cur_mps`.
|
||||
//!
|
||||
//! As it processes them, it fills up `eof_mps` with threads that would be valid if
|
||||
@ -73,12 +73,13 @@
|
||||
crate use NamedMatch::*;
|
||||
crate use ParseResult::*;
|
||||
|
||||
use crate::mbe::{self, SequenceRepetition, TokenTree};
|
||||
use crate::mbe::{KleeneOp, TokenTree};
|
||||
|
||||
use rustc_ast::token::{self, DocComment, Nonterminal, Token, TokenKind};
|
||||
use rustc_ast::token::{self, DocComment, Nonterminal, NonterminalKind, Token};
|
||||
use rustc_parse::parser::{NtOrTt, Parser};
|
||||
use rustc_session::parse::ParseSess;
|
||||
use rustc_span::symbol::MacroRulesNormalizedIdent;
|
||||
use rustc_span::Span;
|
||||
|
||||
use smallvec::{smallvec, SmallVec};
|
||||
|
||||
@ -88,153 +89,88 @@
|
||||
use std::borrow::Cow;
|
||||
use std::collections::hash_map::Entry::{Occupied, Vacant};
|
||||
|
||||
// One element is enough to cover 95-99% of vectors for most benchmarks. Also,
|
||||
// vectors longer than one frequently have many elements, not just two or
|
||||
// three.
|
||||
// One element is enough to cover 95-99% of vectors for most benchmarks. Also, vectors longer than
|
||||
// one frequently have many elements, not just two or three.
|
||||
type NamedMatchVec = SmallVec<[NamedMatch; 1]>;
|
||||
|
||||
// This type is used a lot. Make sure it doesn't unintentionally get bigger.
|
||||
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
|
||||
rustc_data_structures::static_assert_size!(NamedMatchVec, 48);
|
||||
|
||||
#[derive(Clone)]
|
||||
enum MatcherKind<'tt> {
|
||||
TopLevel,
|
||||
Delimited(Box<DelimitedSubmatcher<'tt>>),
|
||||
Sequence(Box<SequenceSubmatcher<'tt>>),
|
||||
/// A unit within a matcher that a `MatcherPos` can refer to. Similar to (and derived from)
|
||||
/// `mbe::TokenTree`, but designed specifically for fast and easy traversal during matching.
|
||||
/// Notable differences to `mbe::TokenTree`:
|
||||
/// - It is non-recursive, i.e. there is no nesting.
|
||||
/// - The end pieces of each sequence (the separator, if present, and the Kleene op) are
|
||||
/// represented explicitly, as is the very end of the matcher.
|
||||
///
|
||||
/// This means a matcher can be represented by `&[MatcherLoc]`, and traversal mostly involves
|
||||
/// simply incrementing the current matcher position index by one.
|
||||
enum MatcherLoc<'tt> {
|
||||
Token {
|
||||
token: &'tt Token,
|
||||
},
|
||||
Delimited,
|
||||
Sequence {
|
||||
op: KleeneOp,
|
||||
num_metavar_decls: usize,
|
||||
idx_first_after: usize,
|
||||
next_metavar: usize,
|
||||
seq_depth: usize,
|
||||
},
|
||||
SequenceKleeneOpNoSep {
|
||||
op: KleeneOp,
|
||||
idx_first: usize,
|
||||
},
|
||||
SequenceSep {
|
||||
separator: &'tt Token,
|
||||
},
|
||||
SequenceKleeneOpAfterSep {
|
||||
idx_first: usize,
|
||||
},
|
||||
MetaVarDecl {
|
||||
span: Span,
|
||||
bind: Ident,
|
||||
kind: NonterminalKind,
|
||||
next_metavar: usize,
|
||||
seq_depth: usize,
|
||||
},
|
||||
Eof,
|
||||
}
|
||||
|
||||
#[derive(Clone)]
|
||||
struct DelimitedSubmatcher<'tt> {
|
||||
parent: Parent<'tt>,
|
||||
}
|
||||
|
||||
#[derive(Clone)]
|
||||
struct SequenceSubmatcher<'tt> {
|
||||
parent: Parent<'tt>,
|
||||
seq: &'tt SequenceRepetition,
|
||||
}
|
||||
|
||||
/// Data used to ascend from a submatcher back to its parent matcher. A subset of the fields from
|
||||
/// `MathcherPos`.
|
||||
#[derive(Clone)]
|
||||
struct Parent<'tt> {
|
||||
tts: &'tt [TokenTree],
|
||||
idx: usize,
|
||||
kind: MatcherKind<'tt>,
|
||||
}
|
||||
|
||||
/// A single matcher position, which could be within the top-level matcher, a submatcher, a
|
||||
/// subsubmatcher, etc. For example:
|
||||
/// ```text
|
||||
/// macro_rules! m { $id:ident ( $($e:expr),* ) } => { ... }
|
||||
/// <----------> second submatcher; one tt, one metavar
|
||||
/// <--------------> first submatcher; three tts, zero metavars
|
||||
/// <--------------------------> top-level matcher; two tts, one metavar
|
||||
/// ```
|
||||
struct MatcherPos<'tt> {
|
||||
/// The tokens that make up the current matcher. When we are within a `Sequence` or `Delimited`
|
||||
/// submatcher, this is just the contents of that submatcher.
|
||||
tts: &'tt [TokenTree],
|
||||
|
||||
/// The "dot" position within the current submatcher, i.e. the index into `tts`. Can go one or
|
||||
/// two positions past the final elements in `tts` when dealing with sequences, see
|
||||
/// `parse_tt_inner` for details.
|
||||
/// A single matcher position, representing the state of matching.
|
||||
struct MatcherPos {
|
||||
/// The index into `TtParser::locs`, which represents the "dot".
|
||||
idx: usize,
|
||||
|
||||
/// This vector ends up with one element per metavar in the *top-level* matcher, even when this
|
||||
/// `MatcherPos` is for a submatcher. Each element records token trees matched against the
|
||||
/// relevant metavar by the black box parser. The element will be a `MatchedSeq` if the
|
||||
/// corresponding metavar is within a sequence.
|
||||
/// The matches made against metavar decls so far. On a successful match, this vector ends up
|
||||
/// with one element per metavar decl in the matcher. Each element records token trees matched
|
||||
/// against the relevant metavar by the black box parser. An element will be a `MatchedSeq` if
|
||||
/// the corresponding metavar decl is within a sequence.
|
||||
matches: Lrc<NamedMatchVec>,
|
||||
|
||||
/// The number of sequences this mp is within.
|
||||
seq_depth: usize,
|
||||
|
||||
/// The position in `matches` of the next metavar to be matched against the source token
|
||||
/// stream. Should not be used if there are no metavars.
|
||||
match_cur: usize,
|
||||
|
||||
/// What kind of matcher we are in. For submatchers, this contains enough information to
|
||||
/// reconstitute a `MatcherPos` within the parent once we ascend out of the submatcher.
|
||||
kind: MatcherKind<'tt>,
|
||||
}
|
||||
|
||||
// This type is used a lot. Make sure it doesn't unintentionally get bigger.
|
||||
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
|
||||
rustc_data_structures::static_assert_size!(MatcherPos<'_>, 64);
|
||||
rustc_data_structures::static_assert_size!(MatcherPos, 16);
|
||||
|
||||
impl<'tt> MatcherPos<'tt> {
|
||||
fn top_level(matcher: &'tt [TokenTree], empty_matches: Lrc<NamedMatchVec>) -> Self {
|
||||
MatcherPos {
|
||||
tts: matcher,
|
||||
idx: 0,
|
||||
matches: empty_matches,
|
||||
seq_depth: 0,
|
||||
match_cur: 0,
|
||||
kind: MatcherKind::TopLevel,
|
||||
}
|
||||
}
|
||||
|
||||
fn empty_sequence(
|
||||
parent_mp: &MatcherPos<'tt>,
|
||||
seq: &'tt SequenceRepetition,
|
||||
empty_matches: Lrc<NamedMatchVec>,
|
||||
) -> Self {
|
||||
let mut mp = MatcherPos {
|
||||
tts: parent_mp.tts,
|
||||
idx: parent_mp.idx + 1,
|
||||
matches: parent_mp.matches.clone(), // a cheap clone
|
||||
seq_depth: parent_mp.seq_depth,
|
||||
match_cur: parent_mp.match_cur + seq.num_captures,
|
||||
kind: parent_mp.kind.clone(), // an expensive clone
|
||||
};
|
||||
for idx in parent_mp.match_cur..parent_mp.match_cur + seq.num_captures {
|
||||
mp.push_match(idx, MatchedSeq(empty_matches.clone()));
|
||||
}
|
||||
mp
|
||||
}
|
||||
|
||||
fn sequence(
|
||||
parent_mp: Box<MatcherPos<'tt>>,
|
||||
seq: &'tt SequenceRepetition,
|
||||
empty_matches: Lrc<NamedMatchVec>,
|
||||
) -> Self {
|
||||
let seq_kind = box SequenceSubmatcher {
|
||||
parent: Parent { tts: parent_mp.tts, idx: parent_mp.idx, kind: parent_mp.kind },
|
||||
seq,
|
||||
};
|
||||
let mut mp = MatcherPos {
|
||||
tts: &seq.tts,
|
||||
idx: 0,
|
||||
matches: parent_mp.matches,
|
||||
seq_depth: parent_mp.seq_depth,
|
||||
match_cur: parent_mp.match_cur,
|
||||
kind: MatcherKind::Sequence(seq_kind),
|
||||
};
|
||||
// Start with an empty vec for each metavar within the sequence. Note that `mp.seq_depth`
|
||||
// must have the parent's depth at this point for these `push_match` calls to work.
|
||||
for idx in mp.match_cur..mp.match_cur + seq.num_captures {
|
||||
mp.push_match(idx, MatchedSeq(empty_matches.clone()));
|
||||
}
|
||||
mp.seq_depth += 1;
|
||||
mp
|
||||
}
|
||||
|
||||
/// Adds `m` as a named match for the `idx`-th metavar.
|
||||
fn push_match(&mut self, idx: usize, m: NamedMatch) {
|
||||
impl MatcherPos {
|
||||
/// Adds `m` as a named match for the `metavar_idx`-th metavar. There are only two call sites,
|
||||
/// and both are hot enough to be always worth inlining.
|
||||
#[inline(always)]
|
||||
fn push_match(&mut self, metavar_idx: usize, seq_depth: usize, m: NamedMatch) {
|
||||
let matches = Lrc::make_mut(&mut self.matches);
|
||||
match self.seq_depth {
|
||||
match seq_depth {
|
||||
0 => {
|
||||
// We are not within a sequence. Just append `m`.
|
||||
assert_eq!(idx, matches.len());
|
||||
assert_eq!(metavar_idx, matches.len());
|
||||
matches.push(m);
|
||||
}
|
||||
_ => {
|
||||
// We are within a sequence. Find the final `MatchedSeq` at the appropriate depth
|
||||
// and append `m` to its vector.
|
||||
let mut curr = &mut matches[idx];
|
||||
for _ in 0..self.seq_depth - 1 {
|
||||
let mut curr = &mut matches[metavar_idx];
|
||||
for _ in 0..seq_depth - 1 {
|
||||
match curr {
|
||||
MatchedSeq(seq) => {
|
||||
let seq = Lrc::make_mut(seq);
|
||||
@ -255,9 +191,9 @@ fn push_match(&mut self, idx: usize, m: NamedMatch) {
|
||||
}
|
||||
}
|
||||
|
||||
enum EofMatcherPositions<'tt> {
|
||||
enum EofMatcherPositions {
|
||||
None,
|
||||
One(Box<MatcherPos<'tt>>),
|
||||
One(MatcherPos),
|
||||
Multiple,
|
||||
}
|
||||
|
||||
@ -349,63 +285,6 @@ pub(super) fn count_metavar_decls(matcher: &[TokenTree]) -> usize {
|
||||
MatchedNonterminal(Lrc<Nonterminal>),
|
||||
}
|
||||
|
||||
fn nameize<I: Iterator<Item = NamedMatch>>(
|
||||
sess: &ParseSess,
|
||||
matcher: &[TokenTree],
|
||||
mut res: I,
|
||||
) -> NamedParseResult {
|
||||
// Recursively descend into each type of matcher (e.g., sequences, delimited, metavars) and make
|
||||
// sure that each metavar has _exactly one_ binding. If a metavar does not have exactly one
|
||||
// binding, then there is an error. If it does, then we insert the binding into the
|
||||
// `NamedParseResult`.
|
||||
fn n_rec<I: Iterator<Item = NamedMatch>>(
|
||||
sess: &ParseSess,
|
||||
tt: &TokenTree,
|
||||
res: &mut I,
|
||||
ret_val: &mut FxHashMap<MacroRulesNormalizedIdent, NamedMatch>,
|
||||
) -> Result<(), (rustc_span::Span, String)> {
|
||||
match *tt {
|
||||
TokenTree::Sequence(_, ref seq) => {
|
||||
for next_m in &seq.tts {
|
||||
n_rec(sess, next_m, res.by_ref(), ret_val)?
|
||||
}
|
||||
}
|
||||
TokenTree::Delimited(_, ref delim) => {
|
||||
for next_m in delim.inner_tts() {
|
||||
n_rec(sess, next_m, res.by_ref(), ret_val)?;
|
||||
}
|
||||
}
|
||||
TokenTree::MetaVarDecl(span, _, None) => {
|
||||
if sess.missing_fragment_specifiers.borrow_mut().remove(&span).is_some() {
|
||||
return Err((span, "missing fragment specifier".to_string()));
|
||||
}
|
||||
}
|
||||
TokenTree::MetaVarDecl(sp, bind_name, _) => match ret_val
|
||||
.entry(MacroRulesNormalizedIdent::new(bind_name))
|
||||
{
|
||||
Vacant(spot) => {
|
||||
spot.insert(res.next().unwrap());
|
||||
}
|
||||
Occupied(..) => return Err((sp, format!("duplicated bind name: {}", bind_name))),
|
||||
},
|
||||
TokenTree::Token(..) => (),
|
||||
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
let mut ret_val = FxHashMap::default();
|
||||
for tt in matcher {
|
||||
match n_rec(sess, tt, res.by_ref(), &mut ret_val) {
|
||||
Ok(_) => {}
|
||||
Err((sp, msg)) => return Error(sp, msg),
|
||||
}
|
||||
}
|
||||
|
||||
Success(ret_val)
|
||||
}
|
||||
|
||||
/// Performs a token equality check, ignoring syntax context (that is, an unhygienic comparison)
|
||||
fn token_name_eq(t1: &Token, t2: &Token) -> bool {
|
||||
if let (Some((ident1, is_raw1)), Some((ident2, is_raw2))) = (t1.ident(), t2.ident()) {
|
||||
@ -417,21 +296,24 @@ fn token_name_eq(t1: &Token, t2: &Token) -> bool {
|
||||
}
|
||||
}
|
||||
|
||||
// Note: the position vectors could be created and dropped within `parse_tt`, but to avoid excess
|
||||
// Note: the vectors could be created and dropped within `parse_tt`, but to avoid excess
|
||||
// allocations we have a single vector fo each kind that is cleared and reused repeatedly.
|
||||
pub struct TtParser<'tt> {
|
||||
macro_name: Ident,
|
||||
|
||||
/// The matcher of the current rule.
|
||||
locs: Vec<MatcherLoc<'tt>>,
|
||||
|
||||
/// The set of current mps to be processed. This should be empty by the end of a successful
|
||||
/// execution of `parse_tt_inner`.
|
||||
cur_mps: Vec<Box<MatcherPos<'tt>>>,
|
||||
cur_mps: Vec<MatcherPos>,
|
||||
|
||||
/// The set of newly generated mps. These are used to replenish `cur_mps` in the function
|
||||
/// `parse_tt`.
|
||||
next_mps: Vec<Box<MatcherPos<'tt>>>,
|
||||
next_mps: Vec<MatcherPos>,
|
||||
|
||||
/// The set of mps that are waiting for the black-box parser.
|
||||
bb_mps: Vec<Box<MatcherPos<'tt>>>,
|
||||
bb_mps: Vec<MatcherPos>,
|
||||
|
||||
/// Pre-allocate an empty match array, so it can be cloned cheaply for macros with many rules
|
||||
/// that have no metavars.
|
||||
@ -442,6 +324,7 @@ impl<'tt> TtParser<'tt> {
|
||||
pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
|
||||
TtParser {
|
||||
macro_name,
|
||||
locs: vec![],
|
||||
cur_mps: vec![],
|
||||
next_mps: vec![],
|
||||
bb_mps: vec![],
|
||||
@ -449,6 +332,99 @@ pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
|
||||
}
|
||||
}
|
||||
|
||||
/// Convert a `&[TokenTree]` to a `&[MatcherLoc]`. Note: this conversion happens every time the
|
||||
/// macro is called, which may be many times if there are many call sites or if it is
|
||||
/// recursive. This conversion is fairly cheap and the representation is sufficiently better
|
||||
/// for matching than `&[TokenTree]` that it's a clear performance win even with the overhead.
|
||||
/// But it might be possible to move the conversion outwards so it only occurs once per macro.
|
||||
fn compute_locs(
|
||||
&mut self,
|
||||
sess: &ParseSess,
|
||||
matcher: &'tt [TokenTree],
|
||||
) -> Result<usize, (Span, String)> {
|
||||
fn inner<'tt>(
|
||||
sess: &ParseSess,
|
||||
tts: &'tt [TokenTree],
|
||||
locs: &mut Vec<MatcherLoc<'tt>>,
|
||||
next_metavar: &mut usize,
|
||||
seq_depth: usize,
|
||||
) -> Result<(), (Span, String)> {
|
||||
for tt in tts {
|
||||
match tt {
|
||||
TokenTree::Token(token) => {
|
||||
locs.push(MatcherLoc::Token { token });
|
||||
}
|
||||
TokenTree::Delimited(_, delimited) => {
|
||||
locs.push(MatcherLoc::Delimited);
|
||||
inner(sess, &delimited.all_tts, locs, next_metavar, seq_depth)?;
|
||||
}
|
||||
TokenTree::Sequence(_, seq) => {
|
||||
// We can't determine `idx_first_after` and construct the final
|
||||
// `MatcherLoc::Sequence` until after `inner()` is called and the sequence
|
||||
// end pieces are processed. So we push a dummy value (`Eof` is cheapest to
|
||||
// construct) now, and overwrite it with the proper value below.
|
||||
let dummy = MatcherLoc::Eof;
|
||||
locs.push(dummy);
|
||||
|
||||
let next_metavar_orig = *next_metavar;
|
||||
let op = seq.kleene.op;
|
||||
let idx_first = locs.len();
|
||||
let idx_seq = idx_first - 1;
|
||||
inner(sess, &seq.tts, locs, next_metavar, seq_depth + 1)?;
|
||||
|
||||
if let Some(separator) = &seq.separator {
|
||||
locs.push(MatcherLoc::SequenceSep { separator });
|
||||
locs.push(MatcherLoc::SequenceKleeneOpAfterSep { idx_first });
|
||||
} else {
|
||||
locs.push(MatcherLoc::SequenceKleeneOpNoSep { op, idx_first });
|
||||
}
|
||||
|
||||
// Overwrite the dummy value pushed above with the proper value.
|
||||
locs[idx_seq] = MatcherLoc::Sequence {
|
||||
op,
|
||||
num_metavar_decls: seq.num_captures,
|
||||
idx_first_after: locs.len(),
|
||||
next_metavar: next_metavar_orig,
|
||||
seq_depth,
|
||||
};
|
||||
}
|
||||
&TokenTree::MetaVarDecl(span, bind, kind) => {
|
||||
if let Some(kind) = kind {
|
||||
locs.push(MatcherLoc::MetaVarDecl {
|
||||
span,
|
||||
bind,
|
||||
kind,
|
||||
next_metavar: *next_metavar,
|
||||
seq_depth,
|
||||
});
|
||||
*next_metavar += 1;
|
||||
} else if sess
|
||||
.missing_fragment_specifiers
|
||||
.borrow_mut()
|
||||
.remove(&span)
|
||||
.is_some()
|
||||
{
|
||||
// E.g. `$e` instead of `$e:expr`.
|
||||
return Err((span, "missing fragment specifier".to_string()));
|
||||
}
|
||||
}
|
||||
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
self.locs.clear();
|
||||
let mut next_metavar = 0;
|
||||
inner(sess, matcher, &mut self.locs, &mut next_metavar, /* seq_depth */ 0)?;
|
||||
|
||||
// A final entry is needed for eof.
|
||||
self.locs.push(MatcherLoc::Eof);
|
||||
|
||||
// This is the number of metavar decls.
|
||||
Ok(next_metavar)
|
||||
}
|
||||
|
||||
/// Process the matcher positions of `cur_mps` until it is empty. In the process, this will
|
||||
/// produce more mps in `next_mps` and `bb_mps`.
|
||||
///
|
||||
@ -458,8 +434,7 @@ pub(super) fn new(macro_name: Ident) -> TtParser<'tt> {
|
||||
/// track of through the mps generated.
|
||||
fn parse_tt_inner(
|
||||
&mut self,
|
||||
sess: &ParseSess,
|
||||
matcher: &[TokenTree],
|
||||
num_metavar_decls: usize,
|
||||
token: &Token,
|
||||
) -> Option<NamedParseResult> {
|
||||
// Matcher positions that would be valid if the macro invocation was over now. Only
|
||||
@ -467,151 +442,113 @@ fn parse_tt_inner(
|
||||
let mut eof_mps = EofMatcherPositions::None;
|
||||
|
||||
while let Some(mut mp) = self.cur_mps.pop() {
|
||||
// Get the current position of the "dot" (`idx`) in `mp` and the number of token
|
||||
// trees in the matcher (`len`).
|
||||
let idx = mp.idx;
|
||||
let len = mp.tts.len();
|
||||
|
||||
if idx < len {
|
||||
// We are in the middle of a matcher. Compare the matcher's current tt against
|
||||
// `token`.
|
||||
match &mp.tts[idx] {
|
||||
TokenTree::Sequence(_sp, seq) => {
|
||||
let op = seq.kleene.op;
|
||||
if op == mbe::KleeneOp::ZeroOrMore || op == mbe::KleeneOp::ZeroOrOne {
|
||||
// Allow for the possibility of zero matches of this sequence.
|
||||
self.cur_mps.push(box MatcherPos::empty_sequence(
|
||||
&*mp,
|
||||
&seq,
|
||||
self.empty_matches.clone(),
|
||||
));
|
||||
}
|
||||
|
||||
// Allow for the possibility of one or more matches of this sequence.
|
||||
self.cur_mps.push(box MatcherPos::sequence(
|
||||
mp,
|
||||
&seq,
|
||||
self.empty_matches.clone(),
|
||||
));
|
||||
match &self.locs[mp.idx] {
|
||||
&MatcherLoc::Sequence {
|
||||
op,
|
||||
num_metavar_decls,
|
||||
idx_first_after,
|
||||
next_metavar,
|
||||
seq_depth,
|
||||
} => {
|
||||
// Install an empty vec for each metavar within the sequence.
|
||||
for metavar_idx in next_metavar..next_metavar + num_metavar_decls {
|
||||
mp.push_match(
|
||||
metavar_idx,
|
||||
seq_depth,
|
||||
MatchedSeq(self.empty_matches.clone()),
|
||||
);
|
||||
}
|
||||
|
||||
&TokenTree::MetaVarDecl(span, _, None) => {
|
||||
// E.g. `$e` instead of `$e:expr`.
|
||||
if sess.missing_fragment_specifiers.borrow_mut().remove(&span).is_some() {
|
||||
return Some(Error(span, "missing fragment specifier".to_string()));
|
||||
}
|
||||
}
|
||||
|
||||
&TokenTree::MetaVarDecl(_, _, Some(kind)) => {
|
||||
// Built-in nonterminals never start with these tokens, so we can eliminate
|
||||
// them from consideration.
|
||||
//
|
||||
// We use the span of the metavariable declaration to determine any
|
||||
// edition-specific matching behavior for non-terminals.
|
||||
if Parser::nonterminal_may_begin_with(kind, token) {
|
||||
self.bb_mps.push(mp);
|
||||
}
|
||||
}
|
||||
|
||||
TokenTree::Delimited(_, delimited) => {
|
||||
// To descend into a delimited submatcher, we update `mp` appropriately,
|
||||
// including enough information to re-ascend afterwards, and push it onto
|
||||
// `cur_mps`. Later, when we reach the closing delimiter, we will recover
|
||||
// the parent matcher position to ascend. Note that we use `all_tts` to
|
||||
// include the open and close delimiter tokens.
|
||||
let kind = MatcherKind::Delimited(box DelimitedSubmatcher {
|
||||
parent: Parent { tts: mp.tts, idx: mp.idx, kind: mp.kind },
|
||||
if op == KleeneOp::ZeroOrMore || op == KleeneOp::ZeroOrOne {
|
||||
// Try zero matches of this sequence, by skipping over it.
|
||||
self.cur_mps.push(MatcherPos {
|
||||
idx: idx_first_after,
|
||||
matches: mp.matches.clone(), // a cheap clone
|
||||
});
|
||||
mp.tts = &delimited.all_tts;
|
||||
mp.idx = 0;
|
||||
mp.kind = kind;
|
||||
}
|
||||
|
||||
// Try one or more matches of this sequence, by entering it.
|
||||
mp.idx += 1;
|
||||
self.cur_mps.push(mp);
|
||||
}
|
||||
MatcherLoc::MetaVarDecl { kind, .. } => {
|
||||
// Built-in nonterminals never start with these tokens, so we can eliminate
|
||||
// them from consideration. We use the span of the metavariable declaration
|
||||
// to determine any edition-specific matching behavior for non-terminals.
|
||||
if Parser::nonterminal_may_begin_with(*kind, token) {
|
||||
self.bb_mps.push(mp);
|
||||
}
|
||||
}
|
||||
MatcherLoc::Delimited => {
|
||||
// Entering the delimeter is trivial.
|
||||
mp.idx += 1;
|
||||
self.cur_mps.push(mp);
|
||||
}
|
||||
MatcherLoc::Token { token: t } => {
|
||||
// If it's a doc comment, we just ignore it and move on to the next tt in the
|
||||
// matcher. This is a bug, but #95267 showed that existing programs rely on
|
||||
// this behaviour, and changing it would require some care and a transition
|
||||
// period.
|
||||
//
|
||||
// If the token matches, we can just advance the parser.
|
||||
//
|
||||
// Otherwise, this match has failed, there is nothing to do, and hopefully
|
||||
// another mp in `cur_mps` will match.
|
||||
if matches!(t, Token { kind: DocComment(..), .. }) {
|
||||
mp.idx += 1;
|
||||
self.cur_mps.push(mp);
|
||||
}
|
||||
|
||||
TokenTree::Token(t) => {
|
||||
// If it's a doc comment, we just ignore it and move on to the next tt in
|
||||
// the matcher. This is a bug, but #95267 showed that existing programs
|
||||
// rely on this behaviour, and changing it would require some care and a
|
||||
// transition period.
|
||||
//
|
||||
// If the token matches, we can just advance the parser.
|
||||
//
|
||||
// Otherwise, this match has failed, there is nothing to do, and hopefully
|
||||
// another mp in `cur_mps` will match.
|
||||
if matches!(t, Token { kind: DocComment(..), .. }) {
|
||||
mp.idx += 1;
|
||||
self.cur_mps.push(mp);
|
||||
} else if token_name_eq(&t, token) {
|
||||
if let TokenKind::CloseDelim(_) = token.kind {
|
||||
// Ascend out of the delimited submatcher.
|
||||
debug_assert_eq!(idx, len - 1);
|
||||
match mp.kind {
|
||||
MatcherKind::Delimited(submatcher) => {
|
||||
mp.tts = submatcher.parent.tts;
|
||||
mp.idx = submatcher.parent.idx;
|
||||
mp.kind = submatcher.parent.kind;
|
||||
}
|
||||
_ => unreachable!(),
|
||||
}
|
||||
}
|
||||
mp.idx += 1;
|
||||
self.next_mps.push(mp);
|
||||
}
|
||||
}
|
||||
|
||||
// These cannot appear in a matcher.
|
||||
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
|
||||
}
|
||||
} else if let MatcherKind::Sequence(box SequenceSubmatcher { parent, seq }) = &mp.kind {
|
||||
// We are past the end of a sequence.
|
||||
// - If it has no separator, we must be only one past the end.
|
||||
// - If it has a separator, we may be one past the end, in which case we must
|
||||
// look for a separator. Or we may be two past the end, in which case we have
|
||||
// already dealt with the separator.
|
||||
debug_assert!(idx == len || idx == len + 1 && seq.separator.is_some());
|
||||
|
||||
if idx == len {
|
||||
// Sequence matching may have finished: move the "dot" past the sequence in
|
||||
// `parent`. This applies whether a separator is used or not. If sequence
|
||||
// matching hasn't finished, this `new_mp` will fail quietly when it is
|
||||
// processed next time around the loop.
|
||||
let new_mp = box MatcherPos {
|
||||
tts: parent.tts,
|
||||
idx: parent.idx + 1,
|
||||
matches: mp.matches.clone(), // a cheap clone
|
||||
seq_depth: mp.seq_depth - 1,
|
||||
match_cur: mp.match_cur,
|
||||
kind: parent.kind.clone(), // an expensive clone
|
||||
};
|
||||
self.cur_mps.push(new_mp);
|
||||
}
|
||||
|
||||
if seq.separator.is_some() && idx == len {
|
||||
// Look for the separator.
|
||||
if seq.separator.as_ref().map_or(false, |sep| token_name_eq(token, sep)) {
|
||||
// The matcher has a separator, and it matches the current token. We can
|
||||
// advance past the separator token.
|
||||
} else if token_name_eq(&t, token) {
|
||||
mp.idx += 1;
|
||||
self.next_mps.push(mp);
|
||||
}
|
||||
} else if seq.kleene.op != mbe::KleeneOp::ZeroOrOne {
|
||||
// We don't need to look for a separator: either this sequence doesn't have
|
||||
// one, or it does and we've already handled it. Also, we are allowed to have
|
||||
// more than one repetition. Move the "dot" back to the beginning of the
|
||||
// matcher and try to match again.
|
||||
mp.match_cur -= seq.num_captures;
|
||||
mp.idx = 0;
|
||||
}
|
||||
&MatcherLoc::SequenceKleeneOpNoSep { op, idx_first } => {
|
||||
// We are past the end of a sequence with no separator. Try ending the
|
||||
// sequence. If that's not possible, `ending_mp` will fail quietly when it is
|
||||
// processed next time around the loop.
|
||||
let ending_mp = MatcherPos {
|
||||
idx: mp.idx + 1, // +1 skips the Kleene op
|
||||
matches: mp.matches.clone(), // a cheap clone
|
||||
};
|
||||
self.cur_mps.push(ending_mp);
|
||||
|
||||
if op != KleeneOp::ZeroOrOne {
|
||||
// Try another repetition.
|
||||
mp.idx = idx_first;
|
||||
self.cur_mps.push(mp);
|
||||
}
|
||||
}
|
||||
MatcherLoc::SequenceSep { separator } => {
|
||||
// We are past the end of a sequence with a separator but we haven't seen the
|
||||
// separator yet. Try ending the sequence. If that's not possible, `ending_mp`
|
||||
// will fail quietly when it is processed next time around the loop.
|
||||
let ending_mp = MatcherPos {
|
||||
idx: mp.idx + 2, // +2 skips the separator and the Kleene op
|
||||
matches: mp.matches.clone(), // a cheap clone
|
||||
};
|
||||
self.cur_mps.push(ending_mp);
|
||||
|
||||
if token_name_eq(token, separator) {
|
||||
// The separator matches the current token. Advance past it.
|
||||
mp.idx += 1;
|
||||
self.next_mps.push(mp);
|
||||
}
|
||||
}
|
||||
&MatcherLoc::SequenceKleeneOpAfterSep { idx_first } => {
|
||||
// We are past the sequence separator. This can't be a `?` Kleene op, because
|
||||
// they don't permit separators. Try another repetition.
|
||||
mp.idx = idx_first;
|
||||
self.cur_mps.push(mp);
|
||||
}
|
||||
} else {
|
||||
// We are past the end of the matcher, and not in a sequence. Look for end of
|
||||
// input.
|
||||
debug_assert_eq!(idx, len);
|
||||
if *token == token::Eof {
|
||||
eof_mps = match eof_mps {
|
||||
EofMatcherPositions::None => EofMatcherPositions::One(mp),
|
||||
EofMatcherPositions::One(_) | EofMatcherPositions::Multiple => {
|
||||
EofMatcherPositions::Multiple
|
||||
MatcherLoc::Eof => {
|
||||
// We are past the matcher's end, and not in a sequence. Try to end things.
|
||||
debug_assert_eq!(mp.idx, self.locs.len() - 1);
|
||||
if *token == token::Eof {
|
||||
eof_mps = match eof_mps {
|
||||
EofMatcherPositions::None => EofMatcherPositions::One(mp),
|
||||
EofMatcherPositions::One(_) | EofMatcherPositions::Multiple => {
|
||||
EofMatcherPositions::Multiple
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -623,11 +560,11 @@ fn parse_tt_inner(
|
||||
if *token == token::Eof {
|
||||
Some(match eof_mps {
|
||||
EofMatcherPositions::One(mut eof_mp) => {
|
||||
assert_eq!(eof_mp.matches.len(), count_metavar_decls(matcher));
|
||||
assert_eq!(eof_mp.matches.len(), num_metavar_decls);
|
||||
// Need to take ownership of the matches from within the `Lrc`.
|
||||
Lrc::make_mut(&mut eof_mp.matches);
|
||||
let matches = Lrc::try_unwrap(eof_mp.matches).unwrap().into_iter();
|
||||
nameize(sess, matcher, matches)
|
||||
self.nameize(matches)
|
||||
}
|
||||
EofMatcherPositions::Multiple => {
|
||||
Error(token.span, "ambiguity: multiple successful parses".to_string())
|
||||
@ -651,13 +588,18 @@ pub(super) fn parse_tt(
|
||||
parser: &mut Cow<'_, Parser<'_>>,
|
||||
matcher: &'tt [TokenTree],
|
||||
) -> NamedParseResult {
|
||||
let num_metavar_decls = match self.compute_locs(parser.sess, matcher) {
|
||||
Ok(num_metavar_decls) => num_metavar_decls,
|
||||
Err((span, msg)) => return Error(span, msg),
|
||||
};
|
||||
|
||||
// A queue of possible matcher positions. We initialize it with the matcher position in
|
||||
// which the "dot" is before the first token of the first token tree in `matcher`.
|
||||
// `parse_tt_inner` then processes all of these possible matcher positions and produces
|
||||
// possible next positions into `next_mps`. After some post-processing, the contents of
|
||||
// `next_mps` replenish `cur_mps` and we start over again.
|
||||
self.cur_mps.clear();
|
||||
self.cur_mps.push(box MatcherPos::top_level(matcher, self.empty_matches.clone()));
|
||||
self.cur_mps.push(MatcherPos { idx: 0, matches: self.empty_matches.clone() });
|
||||
|
||||
loop {
|
||||
self.next_mps.clear();
|
||||
@ -665,8 +607,8 @@ pub(super) fn parse_tt(
|
||||
|
||||
// Process `cur_mps` until either we have finished the input or we need to get some
|
||||
// parsing from the black-box parser done.
|
||||
if let Some(result) = self.parse_tt_inner(parser.sess, matcher, &parser.token) {
|
||||
return result;
|
||||
if let Some(res) = self.parse_tt_inner(num_metavar_decls, &parser.token) {
|
||||
return res;
|
||||
}
|
||||
|
||||
// `parse_tt_inner` handled all of `cur_mps`, so it's empty.
|
||||
@ -693,8 +635,11 @@ pub(super) fn parse_tt(
|
||||
(0, 1) => {
|
||||
// We need to call the black-box parser to get some nonterminal.
|
||||
let mut mp = self.bb_mps.pop().unwrap();
|
||||
if let TokenTree::MetaVarDecl(span, _, Some(kind)) = mp.tts[mp.idx] {
|
||||
let match_cur = mp.match_cur;
|
||||
let loc = &self.locs[mp.idx];
|
||||
if let &MatcherLoc::MetaVarDecl {
|
||||
span, kind, next_metavar, seq_depth, ..
|
||||
} = loc
|
||||
{
|
||||
// We use the span of the metavariable declaration to determine any
|
||||
// edition-specific matching behavior for non-terminals.
|
||||
let nt = match parser.to_mut().parse_nonterminal(kind) {
|
||||
@ -714,9 +659,8 @@ pub(super) fn parse_tt(
|
||||
NtOrTt::Nt(nt) => MatchedNonterminal(Lrc::new(nt)),
|
||||
NtOrTt::Tt(tt) => MatchedTokenTree(tt),
|
||||
};
|
||||
mp.push_match(match_cur, m);
|
||||
mp.push_match(next_metavar, seq_depth, m);
|
||||
mp.idx += 1;
|
||||
mp.match_cur += 1;
|
||||
} else {
|
||||
unreachable!()
|
||||
}
|
||||
@ -737,11 +681,9 @@ fn ambiguity_error(&self, token_span: rustc_span::Span) -> NamedParseResult {
|
||||
let nts = self
|
||||
.bb_mps
|
||||
.iter()
|
||||
.map(|mp| match mp.tts[mp.idx] {
|
||||
TokenTree::MetaVarDecl(_, bind, Some(kind)) => {
|
||||
format!("{} ('{}')", kind, bind)
|
||||
}
|
||||
_ => panic!(),
|
||||
.map(|mp| match &self.locs[mp.idx] {
|
||||
MatcherLoc::MetaVarDecl { bind, kind, .. } => format!("{} ('{}')", kind, bind),
|
||||
_ => unreachable!(),
|
||||
})
|
||||
.collect::<Vec<String>>()
|
||||
.join(" or ");
|
||||
@ -759,4 +701,19 @@ fn ambiguity_error(&self, token_span: rustc_span::Span) -> NamedParseResult {
|
||||
),
|
||||
)
|
||||
}
|
||||
|
||||
fn nameize<I: Iterator<Item = NamedMatch>>(&self, mut res: I) -> NamedParseResult {
|
||||
// Make that each metavar has _exactly one_ binding. If so, insert the binding into the
|
||||
// `NamedParseResult`. Otherwise, it's an error.
|
||||
let mut ret_val = FxHashMap::default();
|
||||
for loc in self.locs.iter() {
|
||||
if let &MatcherLoc::MetaVarDecl { span, bind, .. } = loc {
|
||||
match ret_val.entry(MacroRulesNormalizedIdent::new(bind)) {
|
||||
Vacant(spot) => spot.insert(res.next().unwrap()),
|
||||
Occupied(..) => return Error(span, format!("duplicated bind name: {}", bind)),
|
||||
};
|
||||
}
|
||||
}
|
||||
Success(ret_val)
|
||||
}
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user