Document the macro parser a little more.
This commit is contained in:
parent
9297d1f00a
commit
e54acbf848
@ -13,14 +13,72 @@ import ast_util::mk_sp;
|
||||
import std::map::{hashmap, uint_hash};
|
||||
|
||||
/* This is an Earley-like parser, without support for in-grammar nonterminals,
|
||||
onlyl calling out to the main rust parser for named nonterminals (which it
|
||||
only by calling out to the main rust parser for named nonterminals (which it
|
||||
commits to fully when it hits one in a grammar). This means that there are no
|
||||
completer or predictor rules, and therefore no need to store one column per
|
||||
token: instead, there's a set of current Earley items and a set of next
|
||||
ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
|
||||
pathological cases, is worse than traditional Earley parsing, but it's an
|
||||
easier fit for Macro-by-Example-style rules, and I think the overhead is
|
||||
lower. */
|
||||
lower. (In order to prevent the pathological case, we'd need to lazily
|
||||
construct the resulting `named_match`es at the very end. It'd be a pain,
|
||||
and require more memory to keep around old items, but it would also save
|
||||
overhead)*/
|
||||
|
||||
/* Quick intro to how the parser works:
|
||||
|
||||
A 'position' is a dot in the middle of a matcher, usually represented as a
|
||||
dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`.
|
||||
|
||||
The parser walks through the input a character at a time, maintaining a list
|
||||
of items consistent with the current position in the input string: `cur_eis`.
|
||||
|
||||
As it processes them, it fills up `eof_eis` with items that would be valid if
|
||||
the macro invocation is now over, `bb_eis` with items that are waiting on
|
||||
a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting
|
||||
on the a particular token. Most of the logic concerns moving the · through the
|
||||
repetitions indicated by Kleene stars. It only advances or calls out to the
|
||||
real Rust parser when no `cur_eis` items remain
|
||||
|
||||
Example: Start parsing `a a a a b` against [· a $( a )* a b].
|
||||
|
||||
Remaining input: `a a a a b`
|
||||
next_eis: [· a $( a )* a b]
|
||||
|
||||
- - - Advance over an `a`. - - -
|
||||
|
||||
Remaining input: `a a a b`
|
||||
cur: [a · $( a )* a b]
|
||||
Descend/Skip (first item).
|
||||
next: [a $( · a )* a b] [a $( a )* · a b].
|
||||
|
||||
- - - Advance over an `a`. - - -
|
||||
|
||||
Remaining input: `a a b`
|
||||
cur: [a $( a · )* a b] next: [a $( a )* a · b]
|
||||
Finish/Repeat (first item)
|
||||
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
|
||||
|
||||
- - - Advance over an `a`. - - - (this looks exactly like the last step)
|
||||
|
||||
Remaining input: `a b`
|
||||
cur: [a $( a · )* a b] next: [a $( a )* a · b]
|
||||
Finish/Repeat (first item)
|
||||
next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
|
||||
|
||||
- - - Advance over an `a`. - - - (this looks exactly like the last step)
|
||||
|
||||
Remaining input: `b`
|
||||
cur: [a $( a · )* a b] next: [a $( a )* a · b]
|
||||
Finish/Repeat (first item)
|
||||
next: [a $( a )* · a b] [a $( · a )* a b]
|
||||
|
||||
- - - Advance over a `b`. - - -
|
||||
|
||||
Remaining input: ``
|
||||
eof: [a $( a )* a b ·]
|
||||
|
||||
*/
|
||||
|
||||
|
||||
/* to avoid costly uniqueness checks, we require that `match_seq` always has a
|
||||
|
Loading…
x
Reference in New Issue
Block a user