diff --git a/src/libsyntax/ext/tt/earley_parser.rs b/src/libsyntax/ext/tt/earley_parser.rs index c9a6928a72d..04f0e5f0a83 100644 --- a/src/libsyntax/ext/tt/earley_parser.rs +++ b/src/libsyntax/ext/tt/earley_parser.rs @@ -13,14 +13,72 @@ import ast_util::mk_sp; import std::map::{hashmap, uint_hash}; /* This is an Earley-like parser, without support for in-grammar nonterminals, -onlyl calling out to the main rust parser for named nonterminals (which it +only by calling out to the main rust parser for named nonterminals (which it commits to fully when it hits one in a grammar). This means that there are no completer or predictor rules, and therefore no need to store one column per token: instead, there's a set of current Earley items and a set of next ones. Instead of NTs, we have a special case for Kleene star. The big-O, in pathological cases, is worse than traditional Earley parsing, but it's an easier fit for Macro-by-Example-style rules, and I think the overhead is -lower. */ +lower. (In order to prevent the pathological case, we'd need to lazily +construct the resulting `named_match`es at the very end. It'd be a pain, +and require more memory to keep around old items, but it would also save +overhead)*/ + +/* Quick intro to how the parser works: + +A 'position' is a dot in the middle of a matcher, usually represented as a +dot. For example `· a $( a )* a b` is a position, as is `a $( · a )* a b`. + +The parser walks through the input a character at a time, maintaining a list +of items consistent with the current position in the input string: `cur_eis`. + +As it processes them, it fills up `eof_eis` with items that would be valid if +the macro invocation is now over, `bb_eis` with items that are waiting on +a Rust nonterminal like `$e:expr`, and `next_eis` with items that are waiting +on the a particular token. Most of the logic concerns moving the · through the +repetitions indicated by Kleene stars. It only advances or calls out to the +real Rust parser when no `cur_eis` items remain + +Example: Start parsing `a a a a b` against [· a $( a )* a b]. + +Remaining input: `a a a a b` +next_eis: [· a $( a )* a b] + +- - - Advance over an `a`. - - - + +Remaining input: `a a a b` +cur: [a · $( a )* a b] +Descend/Skip (first item). +next: [a $( · a )* a b] [a $( a )* · a b]. + +- - - Advance over an `a`. - - - + +Remaining input: `a a b` +cur: [a $( a · )* a b] next: [a $( a )* a · b] +Finish/Repeat (first item) +next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b] + +- - - Advance over an `a`. - - - (this looks exactly like the last step) + +Remaining input: `a b` +cur: [a $( a · )* a b] next: [a $( a )* a · b] +Finish/Repeat (first item) +next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b] + +- - - Advance over an `a`. - - - (this looks exactly like the last step) + +Remaining input: `b` +cur: [a $( a · )* a b] next: [a $( a )* a · b] +Finish/Repeat (first item) +next: [a $( a )* · a b] [a $( · a )* a b] + +- - - Advance over a `b`. - - - + +Remaining input: `` +eof: [a $( a )* a b ·] + + */ /* to avoid costly uniqueness checks, we require that `match_seq` always has a